#Pandas

Pandas is a Python library used for data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series.

A DataFrame is a two-dimensional tabular data structure with labeled axes (rows and columns). It can be created from various data sources like lists, dictionaries, NumPy arrays, or CSV files.

**Features of Pandas DataFrame:**

- **Data Alignment:** It automatically aligns data based on row and column labels, making data manipulation easier.
- **Data Selection:** Data can be selected using various methods like indexing, slicing, and filtering.
- **Data Manipulation:** Operations like adding, dropping, modifying columns, and merging DataFrames can be performed easily.
- **Statistical Analysis:** Pandas provides various statistical methods for analyzing data, such as calculating mean, standard deviation, and correlation.
- **Data Visualization:** It integrates with visualization libraries like Matplotlib and Seaborn for creating informative plots and charts.

In [2]:
import pandas as pd

In [3]:
import numpy as np

In [4]:
df=pd.DataFrame(np.arange(0,20).reshape(5,4),index=["Row1","Row2","Row3","Row4","Row5"],columns=["Col1","Col2","Col3","Col4"])

In [5]:
df.head()

Unnamed: 0,Col1,Col2,Col3,Col4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [6]:
df.to_csv('Test1.csv')

In [7]:
type(df.loc["Row1"])

In [8]:
df.iloc[0:1,0:2]

Unnamed: 0,Col1,Col2
Row1,0,1


In [9]:
type(df.iloc[0:1,0:2])

In [10]:
df["Col1"].unique()

array([ 0,  4,  8, 12, 16])

In [12]:
df=pd.read_csv('Merc.csv')

In [13]:
df

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4204,8405,107.39,ak,s,as,c,d,aa,d,q,...,1,0,0,0,0,0,0,0,0,0
4205,8406,108.77,j,o,t,d,d,aa,h,h,...,0,1,0,0,0,0,0,0,0,0
4206,8412,109.22,ak,v,r,a,d,aa,g,e,...,0,0,1,0,0,0,0,0,0,0
4207,8415,87.48,al,r,e,f,d,aa,l,u,...,0,0,0,0,0,0,0,0,0,0


In [17]:
df[df['y']>10]

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4204,8405,107.39,ak,s,as,c,d,aa,d,q,...,1,0,0,0,0,0,0,0,0,0
4205,8406,108.77,j,o,t,d,d,aa,h,h,...,0,1,0,0,0,0,0,0,0,0
4206,8412,109.22,ak,v,r,a,d,aa,g,e,...,0,0,1,0,0,0,0,0,0,0
4207,8415,87.48,al,r,e,f,d,aa,l,u,...,0,0,0,0,0,0,0,0,0,0


In [18]:
df.corr()

  df.corr()


Unnamed: 0,ID,y,X10,X11,X12,X13,X14,X15,X16,X17,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
ID,1.000000,-0.055108,0.001602,,0.058988,-0.031917,-0.025438,0.002237,-0.036480,-0.038171,...,0.045229,-0.080259,-0.022965,0.030371,0.023382,-0.013577,-0.038171,-0.009332,-0.015355,0.029059
y,-0.055108,1.000000,-0.026985,,0.089792,0.048276,0.193643,0.023116,0.048946,-0.159815,...,0.029100,0.114005,0.061403,-0.258679,0.067919,0.040932,-0.159815,0.040291,-0.004591,-0.022280
X10,0.001602,-0.026985,1.000000,,-0.033084,-0.028806,-0.100474,-0.002532,-0.005944,-0.010164,...,0.165277,-0.028618,-0.074244,-0.016870,-0.011374,-0.010479,-0.010164,-0.004740,-0.002532,-0.004387
X11,,,,,,,,,,,...,,,,,,,,,,
X12,0.058988,0.089792,-0.033084,,1.000000,0.214825,-0.246513,-0.006212,-0.014584,-0.024937,...,-0.107864,-0.070214,0.030134,-0.016043,-0.027907,-0.005566,-0.024937,-0.011628,-0.006212,-0.010765
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
X380,-0.013577,0.040932,-0.010479,,-0.005566,0.023045,0.007743,-0.001968,-0.004619,-0.007899,...,-0.061741,-0.022240,-0.061168,-0.013110,-0.008839,1.000000,-0.007899,-0.003683,-0.001968,-0.003410
X382,-0.038171,-0.159815,-0.010164,,-0.024937,-0.021713,0.012713,-0.001908,-0.004480,1.000000,...,-0.059883,-0.021571,-0.059327,-0.012716,-0.008573,-0.007899,1.000000,-0.003572,-0.001908,-0.003307
X383,-0.009332,0.040291,-0.004740,,-0.011628,-0.010125,0.023604,-0.000890,-0.002089,-0.003572,...,-0.015413,-0.010059,0.035107,-0.005930,-0.003998,-0.003683,-0.003572,1.000000,-0.000890,-0.001542
X384,-0.015355,-0.004591,-0.002532,,-0.006212,0.041242,0.025199,-0.000475,-0.001116,-0.001908,...,-0.014917,-0.005373,0.008694,-0.003168,-0.002136,-0.001968,-0.001908,-0.000890,1.000000,-0.000824


##Pickle is a Python module used for object serialization and deserialization. It allows you to convert Python objects into a binary format that can be stored, transmitted, and later reconstructed into the original object.

**Features of Pickle:**

- **Serialization:** It converts Python objects into a binary stream that can be written to a file or transmitted over a network.
- **Deserialization:** It reconstructs the original Python object from the binary stream.
- **Cross-Platform:** It can serialize and deserialize objects across different platforms and Python versions.
- **Supports Various Data Types:** It can handle various data types, including custom objects, lists, dictionaries, and NumPy arrays.


In [19]:
df.to_pickle('df_merc')

In [20]:
de=pd.read_pickle('df_merc')

In [21]:
de.head()

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0
