# Pandas

## 1] Pandas intouduction

### Pandas is a Python library used for working with data sets.

### It has functions for analyzing, cleaning, exploring, and manipulating data.

## 2] Why Use Pandas?

### Pandas allows us to analyze big data and make conclusions based on statistical theories.

### Pandas can clean messy data sets, and make them readable and relevant.

### * Data Structures:
### Pandas offers efficient and intuitive data structures like DataFrames (tabular, spreadsheet-like data) and Series (one-dimensional labeled arrays), making it easy to organize and access data.

### * Data Cleaning and Preparation:
### It provides extensive functionalities for handling missing data (NaN values), cleaning messy datasets, removing duplicates, and transforming data into a suitable format for analysis.


### * Data Manipulation and Transformation:
### Pandas enables various operations such as filtering, sorting, grouping (with groupby), merging, joining, and reshaping data, allowing for flexible data manipulation.


### * input/Output Operations:
### It supports reading and writing data in various formats, including CSV, Excel, SQL databases, and HDF5, facilitating data import and export.

### * Data Analysis and Statistics:
### Pandas integrates well with other numerical libraries like NumPy and offers functionalities for calculating summary statistics, correlations, and performing other analytical tasks.

### * Time Series Analysis:
### It has specialized tools for working with time-series data, including resampling, shifting, and handling time-based indexing.

### * Integration with other Libraries:
### Pandas seamlessly integrates with other essential Python libraries for data science, such as Matplotlib and Seaborn for visualization, and scikit-learn for machine learning.


## 3] Import Pandas

In [None]:
import pandas

### example:- 

In [25]:
import pandas 
data= {
    "student name":["ravi","ram","raj","rakesh","rohan"],
    "english":[10,20,55,60,70],
    "maths":[15,17,18,19,20]
}
df=pandas.DataFrame(data)
print(df)
    

  student name  english  maths
0         ravi       10     15
1          ram       20     17
2          raj       55     18
3       rakesh       60     19
4        rohan       70     20


## 4] Pandas Series

### Series
### A Pandas Series is like a column in a table.

### It is a one-dimensional array holding data of any type.

### Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

### With the index argument, you can name your own labels.

In [26]:
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

x    1
y    7
z    2
dtype: int64


## 5] Pandas DataFrames

### A pandas DataFrame is a two-dimensional, mutable, tabular data structure with labeled axes (rows and columns). It is the primary data structure used in the pandas library for data manipulation and analysis in Python. 

#### Example

In [27]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

   calories  duration
0       420        50
1       380        40
2       390        45


### Locate Row

In [28]:
#refer to the row index:
print(df.loc[0])

calories    420
duration     50
Name: 0, dtype: int64


## 6] Pandas Read CSV

#### Example

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string()) 

In [34]:
df = pd.read_csv('C:/Users/Hi/Downloads/titanic.csv')

## 7] Pandas - Analyzing DataFrames

In [35]:
df.head(4)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False


In [36]:
df.tail(4)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
887,1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
890,0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True


## Finding Relationships

### The corr() method calculates the relationship between each column in your data set.

#### example 

In [None]:
df.corr()

## 8] pandas operation

#### one colom sum 

In [38]:
df['age'].sum()

np.float64(21205.17)

#### one colom max

In [39]:
df['age'].max()

np.float64(80.0)

#### one colom mean

In [40]:
df['age'].mean()

np.float64(29.69911764705882)

#### one colom median 

In [41]:
df['age'].median()

np.float64(28.0)

#### one colom standard deviation

In [42]:
df['age'].std()

np.float64(14.526497332334042)

#### adding colom in a dataset

In [49]:
df['affter five years'] = df['age'] +5


In [51]:
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,age_lenght,affter five years
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False,22.0,27.0
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,38.0,43.0
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True,26.0,31.0
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False,35.0,40.0
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True,35.0,40.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True,27.0,32.0
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True,19.0,24.0
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False,,
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True,26.0,31.0


#### show the rows