### Quiz 7
### Recursion Example (tree and stack) in Canvas 

## IPython features

IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax. There are two kinds of magics, line-oriented and cell-oriented. Line magics are prefixed with the % character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes. Lines magics can return results and can be used in the right hand side of an assignment. Cell magics are prefixed with a double %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

[Pandas](https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html)

<b> Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language. </b>

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline 


The following code example is to illustrate some descriptive analysis using [Iris Data Set][1]. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

![image.png](attachment:image.png)

Attribute Information:

1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class: 
    - Iris Setosa 
    - Iris Versicolour 
    - Iris Virginica



[1]:https://archive.ics.uci.edu/ml/datasets/iris

## [Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)

Two-dimensional, size-mutable, potentially heterogeneous tabular data.      

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects.       

In [None]:
help(pd.DataFrame)

In [None]:
#Read csv file to make a pandas framework
iris_df = pd.read_csv('./Iris.csv')

In [None]:
#Columns
iris_df.columns

In [None]:
iris_df

In [None]:
#Create new dataframe from the old one
tempDataFrame=pd.DataFrame()

In [None]:
#drop a column
tempDataFrame=iris_df.drop(['Id'],axis=1)

In [None]:
#retrieve the first 5 rows
tempDataFrame.head()

In [None]:
#Back to orginal pd
#Filter with Columns
iris_df[['SepalLengthCm', 'SepalWidthCm']]

In [None]:
#Select by rows
iris_df[:3]

In [None]:
iris_df

In [None]:
iris_df.shape

In [None]:
iris_df['SepalLengthCm'].plot()

In [None]:
#counts for categorical values.
iris_df['Species'].value_counts()

In [None]:
#Have a pie figure
iris_df['Species'].value_counts().plot(kind='pie')

In [None]:
iris_df[iris_df['PetalWidthCm']<=0.5]

In [None]:
len(iris_df[iris_df['PetalWidthCm']<=0.5])

In [None]:
iris_df[iris_df['PetalWidthCm']<=0.5]['SepalWidthCm']

In [None]:
iris_df[iris_df['PetalWidthCm']<=0.5]['SepalWidthCm']

In [None]:
type(iris_df[iris_df['PetalWidthCm']<=0.5]['SepalWidthCm'])

In [None]:
help(pd.core.series)

In [None]:
iris_df[iris_df['PetalWidthCm']<=0.5]['SepalWidthCm'].sum()

In [None]:
iris_df[iris_df['PetalWidthCm']<=0.5]['SepalWidthCm'].mean()

In [None]:
iris_df[(iris_df['PetalWidthCm']<=0.5) & (iris_df['PetalLengthCm'] <=1.2) ]

In [None]:
import numpy as np
help(np.where)

In [None]:
import numpy as np

idx=np.where((iris_df['PetalWidthCm']<=0.5) & (iris_df['PetalLengthCm'] <=1.2))
print(idx,type(idx))
print(type(idx[0]))

In [None]:
iris_df.loc[idx]

[More about conditions in pandas](https://kanoki.org/2020/01/21/pandas-dataframe-filter-with-multiple-conditions/)

In [None]:
iris_df.drop("Id", axis=1).describe()

In [None]:
#Draw the histograms of numeric features
iris_df.drop("Id", axis=1).hist(bins=50, figsize=(20,15))

In [None]:
#Compute Pearson pairwise correlation of features
corr_matrix=iris_df.drop("Id", axis=1).corr()
print(corr_matrix)

[Heatmap](https://en.wikipedia.org/wiki/Heat_map)

In [None]:
import matplotlib.pyplot as plt 
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(corr_matrix,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(iris_df.drop(["Id","Species"], axis=1).columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(iris_df.drop(["Id","Species"], axis=1).columns)
ax.set_yticklabels(iris_df.drop(["Id","Species"], axis=1).columns)
plt.show()

In [None]:
#Visualize the relationship between pairs of features with resepect to the outcome or class 
sns.pairplot(iris_df.drop("Id", axis=1), hue="Species", size=3)  

## [Group by](https://realpython.com/pandas-groupby/)

In [None]:
iris_df.groupby(['Species'])['SepalLengthCm'].sum()

In [None]:
## new frame from heart csv
heart_df = pd.read_csv('./heart.csv')

In [None]:
heart_df.head()

In [None]:
heart_df.groupby(['sex','cht'])['oldpeak'].sum()

In [None]:
heart_df[heart_df['age']>=70]

In [None]:
heart_df[heart_df['age']>=70].groupby(['sex','cht'])['oldpeak'].count()

In [None]:
heart_df[heart_df['age']>=70].groupby(['sex','cht'])['oldpeak'].sum()

In [None]:
np.array(heart_df[heart_df['age']>=70].groupby(['sex','cht'])['oldpeak'].sum())

In [None]:
list(np.array(heart_df[heart_df['age']>=70].groupby(['sex','cht'])['oldpeak'].sum()))

## Create a new dataframe from a dictionary and save it to file

In [None]:
data = {'Age': [30, 12, 90, 83], 'Sex': ['F', 'M', 'M', 'F']}
myPD=pd.DataFrame.from_dict(data)

In [None]:
myPD

In [None]:
myPD.to_csv('myPD.csv',index=False)