# Statistics 

## Key Terms for Data Types
## Numeric
Data that are expressed on a numeric scale.
### Continuous
Data that can take on any value in an interval. (Synonyms: interval, float,
numeric)
### Discrete
Data that can take on only integer values, such as counts. (Synonyms: integer,
count)
###  Categorical
Data that can take on only a specific set of values representing a set of possible
categories. (Synonyms: enums, enumerated, factors, nominal)
### Binary
A special case of categorical data with just two categories of values, e.g., 0/1,
true/false. (Synonyms: dichotomous, logical, indicator, boolean)
Ordinal
Categorical data that has an explicit ordering. (Synonym: ordered factor)

# Key Terms for Estimates of Location
## Mean
The sum of all values divided by the number of values.
Synonym
average
## Weighted mean
The sum of all values times a weight divided by the sum of the weights.
Synonym
weighted average
## Median
The value such that one-half of the data lies above and below.
Synonym
## 50th percentile
Percentile
The value such that P percent of the data lies below.
Synonym
quantile
## Weighted median
The value such that one-half of the sum of the weights lies above and below the
sorted data.
## Trimmed mean
The average of all values after dropping a fixed number of extreme values.
Synonym
truncated mean
## Robust
Not sensitive to extreme values.
Synonym
resistant
## Outlier
A data value that is very different from most of the data.
Synonym
extreme value

# Key Terms for Variability Metrics
## Deviations
The difference between the observed values and the estimate of location.
Synonyms
errors, residuals
##  Variance
The sum of squared deviations from the mean divided by n – 1 where n is the
number of data values.
Synonym
mean-squared-error
## Standard deviation
The square root of the variance.
##  Mean absolute deviation
The mean of the absolute values of the deviations from the mean.
## Range
The difference between the largest and the smallest value in a data set.
## Order statistics
Metrics based on the data values sorted from smallest to biggest.
Synonym
ranks
## Percentile
The value such that P percent of the values take on this value or less and (100–P)
percent take on this value or more.
Synonym
quantile
##  Interquartile range
The difference between the 75th percentile and the 25th percentile.
Synonym
IQR

## BodyFat Dataset

#### Lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men.

Attributes: 

* Density determined from underwater weighing
* Percent body fat from Siri's (1956) equation
* Age (years)
* Weight (lbs) [ 1 lbs = 0,453592 kg]
* Height (inches) [1 inch = 2.54 cm]
* Neck circumference (cm)
* Chest circumference (cm)
* Abdomen 2 circumference (cm)
* Hip circumference (cm)
* Thigh circumference (cm)
* Knee circumference (cm)
* Ankle circumference (cm)
* Biceps (extended) circumference (cm)
* Forearm circumference (cm)
* Wrist circumference (cm)


In [399]:
import pandas as pd
home = "C:/.../bodyfat.csv"
bodyfat = pd.read_csv(home,skiprows=[1])

In [None]:

bodyfat.info()
bodyfat.shape

### Bodyfat DataFrame Description

In [None]:
# Description

bodyfat.info()
bodyfat.head()
bodyfat.describe()

In [None]:
bodyfat.head()

### Null Values?

In [None]:
bodyfat.isna()
bodyfat.isna().any()
print(bodyfat.isnull())
print(bodyfat.isnull().sum()) # Null values in rows
print(bodyfat.isnull().sum().sum()) # Sum of all null values


# Pandas Cut – Continuous to Categorical

* Sometimes analysis becomes effortless on conversion from continuous to discrete data. 
* Pandas’ cut function is a distinguished way of converting numerical continuous data into categorical data. It has 3 major  necessary parts:
    * First and foremost is the 1-D array/DataFrame required for input.
    * Bins represent boundaries of separate bins for continuous data. bins = 0,10,20, ... -> (0-10], (10-20] , etc 
    * Labels: The number of labels without exception will be one lower than the number of bins. ["young", "middle", ...]


In [None]:
bodyfat['Label'] = pd.cut(x=bodyfat['Age'], bins=[20, 35, 50, 65, 99], 
                     labels=['YoungAdults', 'Adults', 'Elderly', 'Seniors'])

bodyfat["Label"].value_counts()

In [None]:
sns.histplot(bodyfat, x="Weight",bins=10,hue="Label") 

## Descriptive Statistical Functions

|Sr.No. | Function   | Description|
|:----- |:----- |:-----|
|1 	| count() | Number of non-null observations|
|2 	| sum()   | Sum of values|
|3 	| mean()  |	Mean of Values|
|4 	| median() | Median of Values|
|5 	| mode() |	Mode of values|
|6 	| std() |	Standard Deviation of the Values|
|7 	| min() |	Minimum Value|
|8 	| max() 	|Maximum Value|
|9  | abs() 	|Absolute Value|
|10 | prod() |	Product of Values|
|11 | cumsum() |	Cumulative Sum|
|12 | cumprod() |	Cumulative Product|

## Statistical functions

In [None]:
print(bodyfat["Density"].mean(), bodyfat["Weight"].mean()); 
print(bodyfat["BodyFat"].median(),bodyfat["Height"].var()); 
print(bodyfat["BodyFat"].quantile(0.5))
print(bodyfat[["BodyFat","Weight"]].quantile([0.05, 0.25, 0.5, 0.75, 0.95]))

In [None]:
# data["BodyFat"].max()
bodyfat[bodyfat["BodyFat"] <=5]
bodyfat.loc[bodyfat["BodyFat"] <=5, "Weight"]  # only Weight Col for BodyFat < 5

# Key Terms for Exploring the Distribution
## Boxplot
A plot introduced by Tukey as a quick way to visualize the distribution of data.
Synonym
box and whiskers plot
## Frequency table
A tally of the count of numeric data values that fall into a set of intervals (bins).
## Histogram
A plot of the frequency table with the bins on the x-axis and the count (or proportion)
on the y-axis. While visually similar, bar charts should not be confused
with histograms. See “Exploring Binary and Categorical Data” on page 27 for a
discussion of the difference.
## Density plot
A smoothed version of the histogram, often representing the probabilty density curve. 
If no standard PDF fits to the data, a kernel density estimate (KDE) plot is used for visualizing the distribution of observations. KDE is used, to represent the data using a continuous probability density curve in one or more dimensions.
https://www.aptech.com/blog/the-fundamentals-of-kernel-density-estimation/

# Plotting with Python/Matplotlib/Seaborn

- Matplotlib is an amazing visualization library in Python for 2D plots of arrays
- Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.

In [None]:
# "matplotlib notebook" -- will lead to interactive plots embedded within the notebook
# "matplotlib inline" will lead to static images of your plot embedded in the notebook

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt # importing the matplotlib pyplot module

plt.style.use('dark_background') # using a plotting style
print(plt.style.available)

* When looking at Matplotlib visualization, one almost always looking at Artists placed on a Figure. 
* There is an Object Hierarchy within Matplotlib. In Matplotlib, a plot is a hierarchy of nested Python objects. AhierA hierarchy means that there is a tree-like structure of Matplotlib objects underlying each plot.
* A Figure object is the outermost container for a Matplotlib plot. The Figure object contain multiple Axes objects. So, the Figure is the final graphic that may contain one or more Axes. The Axes represent an individual plot.

In [None]:

# Here, the figure is the blue region and add_subplot has added an Axes artist to the Figure (see Parts of a Figure). 
# A more complicated visualization can add multiple Axes to the Figure, colorbars, legends, annotations, 
# and the Axes themselves can have multiple Artists added to them (e.g. ax.plot or ax.imshow).

fig, axs = plt.subplots(ncols=2, nrows=2, figsize=(3.5, 2.5),
                        layout="constrained")
# for each Axes, add an artist, in this case a nice label in the middle...
for row in range(2):
    for col in range(2):
        axs[row, col].annotate(f'axs[{row}, {col}]', (0.5, 0.5),
                            transform=axs[row, col].transAxes,
                            ha='center', va='center', fontsize=18,
                            color='darkgrey')

In [None]:
# Another figure with a single Axes and a title centered on the figure

fig, ax = plt.subplots(figsize=(10, 5), facecolor='lightgrey',
                       layout='compressed')  # or e.g. compressed or constrained
fig.suptitle('Figure Plot',color="grey")
ax.set_title('Axes', loc='left', fontstyle='oblique', fontsize='medium', color="green") 
ax.plot(np.random.rand(100)) # plot a random series

#### Colors - Names and Hex Code

In [411]:
import matplotlib.colors as mcolors
#plot_colortable(mcolors.CSS4_COLORS)
#plt.show()
import math

import matplotlib.pyplot as plt

import matplotlib.colors as mcolors
from matplotlib.patches import Rectangle


def plot_colortable(colors, *, ncols=4, sort_colors=True):

    cell_width = 212
    cell_height = 22
    swatch_width = 48
    margin = 12

    # Sort colors by hue, saturation, value and name.
    if sort_colors is True:
        names = sorted(
            colors, key=lambda c: tuple(mcolors.rgb_to_hsv(mcolors.to_rgb(c))))
    else:
        names = list(colors)

    n = len(names)
    nrows = math.ceil(n / ncols)

    width = cell_width * ncols + 2 * margin
    height = cell_height * nrows + 2 * margin
    dpi = 72

    fig, ax = plt.subplots(figsize=(width / dpi, height / dpi), dpi=dpi)
    fig.subplots_adjust(margin/width, margin/height,
                        (width-margin)/width, (height-margin)/height)
    ax.set_xlim(0, cell_width * ncols)
    ax.set_ylim(cell_height * (nrows-0.5), -cell_height/2.)
    ax.yaxis.set_visible(False)
    ax.xaxis.set_visible(False)
    ax.set_axis_off()

    for i, name in enumerate(names):
        row = i % nrows
        col = i // nrows
        y = row * cell_height

        swatch_start_x = cell_width * col
        text_pos_x = cell_width * col + swatch_width + 7

        ax.text(text_pos_x, y, name, fontsize=14,
                horizontalalignment='left',
                verticalalignment='center')

        ax.add_patch(
            Rectangle(xy=(swatch_start_x, y-9), width=swatch_width,
                      height=18, facecolor=colors[name], edgecolor='0.7')
        )

    return fig


In [None]:
plot_colortable(mcolors.CSS4_COLORS)
plt.show()
mcolors.TABLEAU_COLORS

## Plotting Statistical Graphics with Seaborn/Seaborn Objects

* https://seaborn.pydata.org/tutorial/function_overview.html

In [413]:
import seaborn as sns
import seaborn.objects as so

sns.set_theme(style="darkgrid")


## Histogram

In [None]:
# Histogram with KDE = Kernel Density Estimation 
sns.histplot(bodyfat, x="Age",bins=60, kde=True, color="peru") 

### Plotting Multiple Histograms in One Plot

In [None]:
# Sample data

homedir = "C:/.../titanic.csv"
titanicdf = pd.read_csv(homedir, sep=';', decimal = ',',skiprows=[1])  # skip the first row
data1 = titanicdf.query("pclass == 1.0")['age'].dropna()    # equivalent to data1 = df[df['pclass'] == 1.0]['age'] 
data2 = titanicdf.query("pclass == 3.0")['age'].dropna()
     
sns.histplot(data=data1, color='goldenrod', alpha=0.5, kde=False, label='First Class')
sns.histplot(data=data2, color='indigo', alpha=0.5, kde=True, label='Third Class')

# Adding labels and legend
plt.xlabel('Age')
plt.ylabel('Density')
plt.title('Age distribution of Titanic passengers',fontsize=15,color='darkviolet',fontstyle='italic')
plt.legend()

plt.show()

In [None]:
# grouped histograms

colors = ['#0000FF', '#00FF00']
sns.histplot(data=titanicdf, x="age", hue="sex",multiple="stack", palette=colors)
plt.title('Age distribution of Titanic m/f passengers',fontsize=15,color='darkblue',fontstyle='italic')
plt.show()

In [None]:
# Random samples of BodyFat n=20
d1 = bodyfat["Biceps"].sample(n=50, random_state=1)
d2 = bodyfat["Knee"].sample(n=50, random_state=1)
d3 = bodyfat["Thigh"].sample(n=50, random_state=1)


sns.histplot(data=d1, bins=10, color='grey', alpha=0.8, label='Biceps')
sns.histplot(data=d2, bins=10, color='orange', alpha=0.8, label='Knee')
sns.histplot(data=d3, bins=10, color='green', alpha=0.6, label='Thigh')

plt.xlabel('Age')
plt.ylabel('Density')
plt.legend()
plt.title('Sample Distribution of various body circumference measures ',fontsize=15,color='grey',fontstyle='italic')

plt.show()


## Boxplot

In [None]:
#Simple Boxplot
#plt.style.use('ggplot')

fig, ax = plt.subplots(figsize=(4, 8)) 
#  It returns the figure and the array of axes. While calling the seaborn plot 
# we will set the ax parameter  equal to the array of axes that was returned 
# by matplotlib.pyplot.subplots after setting the dimensions of the desired plot.
sns.boxplot(data=bodyfat["BodyFat"], color='peru',ax=ax)

plt.title('BodyFat Distribution',fontsize=15,color='grey',fontstyle='italic')
plt.ylabel('BF in %')
plt.show()

In [None]:
# Boxplot with multiple categories
sns.boxplot(data=titanicdf, y='age', hue='pclass', palette='Set2', gap=0.5)
plt.title("Titanic: Age Distribution per class")

#### Multiple Plots of various attributes of BodyFat data 

In [None]:

sns.color_palette("hls", 8)
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Boxplot with medean and mean (marker = "+")
sns.boxplot(ax=axes[0, 0], data=bodyfat["BodyFat"],color='#56bdff',showmeans=True,  meanprops={"marker": "+", "markeredgecolor": "black", "markersize": "10"})
sns.boxplot(ax=axes[0, 1], data=bodyfat["Hip"],color='#07335d',showmeans=True, meanprops={"marker": "+", "markeredgecolor": "white", "markersize": "10"})
sns.boxplot(ax=axes[0, 2], data=bodyfat["Ankle"],color='#a9aaab')
sns.boxplot(ax=axes[1, 0], data=bodyfat["Weight"],color='#ff9856',notch=True)
sns.boxplot(ax=axes[1, 1], data=bodyfat["Biceps"],color='#ecd1cb',notch=True)
sns.histplot(ax=axes[1, 2], data=bodyfat["Weight"],color='#FAD02E')

# Set the title for the figure
plt.suptitle('Sample Distribution of various body circumference measures',fontsize=35,color='grey',fontstyle='italic')

# set the title to subplots
axes[0, 0].set_title("Boxplot of BodyFat")


#### Empirical cumulative distributions - Cumulative histograms

In [None]:
# Culmulative distribution of BodyFat 

sns.histplot(data=bodyfat, x="BodyFat",  element="step", cumulative=True, stat="density", common_norm=True, color='sandybrown')
#sns.histplot(data=bodyfat, x="BodyFat",  element="poly", cumulative=True, stat="density", common_norm=True, color='sandybrown')

plt.title('Bodyfat cumultative distribution of sample',fontsize=15,color='darkblue',fontstyle='italic')

In [None]:
sns.histplot(
    data=titanicdf, x="age", hue="sex",
    element="step", cumulative=True, stat="density", common_norm=True
)
plt.title('Age distribution of Titanic m/f passengers',fontsize=15,color='darkblue',fontstyle='italic')

In [None]:
# Customize the plot using parameters of the underlying matplotlib function:
sns.boxplot(
    data=titanicdf, x="age", y="sex",
    notch=True, showcaps=False,
    flierprops={"marker": "x"},
    boxprops={"facecolor": (.3, .5, .7, .5)},
    medianprops={"color": "r", "linewidth": 2}
)

## KDE Plots

A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.

In [None]:
sns.kdeplot(data=bodyfat, x="BodyFat", fill=True, common_norm=False, linewidth=1, bw_adjust=1)
# sns.kdeplot(data=bodyfat, x="BodyFat", fill=True, common_norm=False, linewidth=0, bw_adjust=.1)
# bw_adjust controls (> 0) the bandwith of the kernel density estimate
plt.title('Bodyfat distribution of sample',fontsize=15,color='darkblue',fontstyle='italic')

In [None]:
sns.kdeplot(data=bodyfat,x="BodyFat", hue="Label", common_norm=False, palette="crest", alpha=.5, linewidth=2, fill=True)
plt.ylabel('ProbDensity')


### Violinplots

The default violinplot represents a distribution two ways: a patch showing a symmetric kernel density estimate (KDE), and the quartiles / whiskers of a box plot:

In [None]:
sns.violinplot(x=titanicdf["age"])

In [None]:
titanicdf.info()

In [None]:
# Violin plot of titanic age /survival rate distribution depending on class
# titanicdf1 = pd.read_csv(homedir, sep=';', decimal = ',',skiprows=[1])  # "new" dataframe with shipping class as category 

#sns.violinplot(data=titanicdf1, x="age", y="class")
sns.violinplot(data=titanicdf1, x="class", y="age", hue="survived")

# Plotting Categorial Data

### Barplot

In [None]:
sns.barplot(data=bodyfat, x="Label", y="Density", estimator="sum", linewidth=2, fill=True, errorbar=None)
plt.title('Age Groups of Bodyfat Study',fontsize=15,color='darkblue',fontstyle='italic')
plt.ylabel('Number of Participants')
plt.xlabel('Age Group')

In [None]:
# Data for the pie plot
parties = ['ÖVP', 'SPÖ', 'FPÖ', 'GRÜNE', 'NEOS',"REST"]
votes = [1282734, 1032234, 1408514, 402107, 446378,310921]
colors = ['grey', 'red', 'blue', 'green', 'pink','wheat']
explode = (0, 0, 0.1, 0, 0,0)  # explode the 1st slice (i.e. 'Party A')

# Create the pie plot
plt.figure(figsize=(4, 4))
plt.pie(votes, explode=explode, labels=parties, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)
plt.title('Austrian Election Results 2024', fontsize=15, color='grey', fontstyle='italic')
plt.show()

## Correlation /Scatter  Plots

- The scatter plot is a mainstay of statistical visualization. It depicts the joint distribution of two variables using a cloud of points, where each point represents an observation in the dataset. 
- A metric that measures the extent to which numeric variables are associated with one another (ranges from –1 to +1).
$
r = \frac{{}\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})}
{\sqrt{\sum_{i=1}^{n} (x_i - \overline{x})^2  \sum_{i=1}^{n}(y_i - \overline{y})^2}}
$

In [None]:
sns.scatterplot(data=bodyfat, x=bodyfat.Age , y=bodyfat.BodyFat,sizes=(20, 200), color='blue', alpha=0.8)
plt.title('Correlation: BodyFat vs Age',fontsize=15,color='darkblue',fontstyle='italic')

In [None]:
sns.scatterplot(data= bodyfat, x = bodyfat["Weight"], y= bodyfat["BodyFat"], color="black", alpha=0.8)
plt.xlabel("Body Weight (lbs)")
plt.ylabel("Percent Body Fat")


### Scatterplot in "3" dimensions

Adding a further dimension to the plot

In [None]:
sns.scatterplot(data=bodyfat, x="Weight", y="BodyFat", hue="Label", palette="deep",style="Label")
plt.title('Correlation: BodyFat vs Weight of Age classes',fontsize=15,color='darkblue',fontstyle='italic')

### Adding Correlation Coefficient

In [None]:
import scipy
#calculate correlation coefficient between x and y
r = scipy.stats.pearsonr(x=bodyfat.Weight, y=bodyfat.BodyFat)[0]

#create scatterplot
sns.scatterplot(data=bodyfat, x="Weight", y="BodyFat", color="black")

#add correlation coefficient to plot
plt.text(250,45, 'Pearson Correlation: r = ' + str(round(r, 2)), fontsize=10, color='black',bbox=dict(facecolor='grey', alpha=0.5))

### Adding a Regression Line to the Correlation

In [None]:
# Adding a new column for BMI (Body Mass Index) to the bodyfat dataframe [Dim] = lbs/inch^2 -> 703*kg/m^2   

bodyfat1 = bodyfat.copy()

bodyfat1["BMI"] = 703*bodyfat["Weight"]/(bodyfat["Height"])**2

# Scatterplot of BMI vs BodyFat
sns.scatterplot(data=bodyfat1, x="BMI", y="BodyFat", color="blue")

# adding linear regression line
sns.regplot(data=bodyfat1, x="BMI", y="BodyFat", color="blue", scatter_kws={"s": 10})

plt.title("BMI vrs Bodyfat")
plt.ylabel('Bodyfat')
plt.xlabel('BMI')


### Detecting and Removing Outliners 

In [436]:
outliers = 50 # define the threshold for outliers
bodyfat1[bodyfat1["BMI"] > outliers]

bodyfat2 = bodyfat1[bodyfat1["BMI"] < outliers] 

### Scatterplot without outliners

In [None]:
# Scatterplot of BMI vs BodyFat
sns.scatterplot(data=bodyfat2, x="BMI", y="BodyFat", color="blue")

# adding linear regression line
sns.regplot(data=bodyfat2, x="BMI", y="BodyFat", color="blue", scatter_kws={"s": 10},ci=95)

plt.title("BMI vrs Bodyfat")
plt.ylabel('Bodyfat')
plt.xlabel('BMI')

#### Emphasizing continuity with line plots

In cases, where x is e.g. a timeline it's better to use line plots

In [None]:
dowjones = sns.load_dataset("dowjones")
sns.relplot(data=dowjones, x="Date", y="Price", kind="line")

## Pair plots
- Are very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.

- Some examples for various layouts

In [None]:
sns.pairplot(data=bodyfat1, vars=["Weight", "Height", "BodyFat"], hue="Label", palette="deep")

In [None]:
sns.pairplot(data=bodyfat1, vars=["Weight", "Height", "BodyFat"], hue="Label", diag_kind="hist", palette="deep")

In [None]:
sns.pairplot(data=bodyfat1, vars=["Weight", "Height", "BodyFat"], kind="kde")

## Visualizing many attributes in one plot

In [None]:
import statsmodels.api as sm
import scipy.stats as stats

plt.style.use('ggplot')
colors=['#ffcd94','#eac086','#ffad60','#ffe39f']
sns.set_palette(sns.color_palette(colors))

fig,ax = plt.subplots(15,3,figsize=(30,100))
for index,i in enumerate(bodyfat.columns[0:15]):
    sns.histplot(bodyfat[i],ax=ax[index,0])
    sns.boxplot(bodyfat[i],ax=ax[index,1])
    stats.probplot(bodyfat[i],plot=ax[index,2])

#fig.tight_laprobplott()
fig.subplots_adjust(top=0.95)
plt.suptitle("Visualizing Continuous Columns",fontsize=50)

In [None]:
databf.corr()

### Heatmap 
- In a correlation heatmap, each variable is represented by a row and a column, and the cells show the correlation between them. The color of each cell represents the strength and direction of the correlation, with darker/lighter colors indicating stronger correlations.

In [None]:
plt.style.use('ggplot') # using a plotting style
plt.figure(figsize=(10,10))
sns.heatmap(databf.corr(),annot=True,linewidth=0.3,fmt="0.2f")

In [None]:
# for correlation matrix remove the non-numeric columns 
bodyfat = bodyfat.drop(columns=["Label"])

### Search for high correlations

In [None]:
# search for the highest correlations
bfcorr = bodyfat.corr()
max = 0
for i in range(0, len(bfcorr)): # all rows
    for j in range(0, len(bfcorr)): # all columns
        # if the correlation is greater than 0.8 and not 1
        if (bfcorr.iloc[i,j] > 0.8) & (bfcorr.iloc[i,j] != 1):
            #print(bfcorr.columns[i], bfcorr.columns[j], bfcorr.iloc[i,j])
            if(bfcorr.iloc[i,j] > max):
                max = bfcorr.iloc[i,j]
                col1 = bfcorr.columns[i]
                col2 = bfcorr.columns[j]
                
print(col1, col2, max)           

In [None]:
# calculate the correlation matrix
bodyfat.corr()

# search for the highest correlations
corrmax = bodyfat.corr().unstack().sort_values(ascending=False)
print(corrmax[corrmax < 1].drop_duplicates())

In [None]:
plt.style.use('ggplot') # using a plotting style
plt.figure(figsize=(10,10))
sns.heatmap(bodyfat.corr(),annot=True,linewidth=0.3,fmt="0.2f")