## Pandas

Pandas provides two main classes of data structures: Series and DataFrame. 
A Series is a one-dimensional labeled array that can hold data of any type.
A DataFrame is a two-dimensional labeled array that can hold multiple Series, where each Series can have a different data type.

Pandas provides a wide range of functions for data cleaning, data preprocessing, data manipulation, and data analysis. Some of the commonly used functions in pandas include reading and writing data to files, merging and joining data from multiple sources, reshaping data, filtering data based on certain conditions, grouping and aggregating data, and more.

EDA stands for Exploratory Data Analysis. It is the process of analyzing and summarizing data in order to gain insights and understand the patterns, relationships, and distributions within the data. EDA is often the first step in any data analysis project and is used to identify anomalies, detect missing or inconsistent data, and explore potential relationships between variables. Common EDA techniques include summary statistics, data visualization, correlation analysis, and clustering analysis. EDA is important because it helps to ensure that the data is appropriate for the analysis being conducted and can provide valuable insights that can guide subsequent analyses.

In [None]:
# Pandas:
# 1) Series : 1D , Store any type of data or any type of python Object , In single series store single datatype
# 2) DataFrame : 2D

In [2]:
import pandas as pd

In [None]:
# Series :- one-dimensional labeled array  ( #single column)

In [3]:
pd.Series([10,20,30])

0    10
1    20
2    30
dtype: int64

In [4]:
s1 = pd.Series(data=[10,20,30], index=['a', 'b' ,'c'])
s1

a    10
b    20
c    30
dtype: int64

In [5]:
s1 = pd.Series(data=[10,20,30])

In [6]:
s1

0    10
1    20
2    30
dtype: int64

In [7]:
("a b c".split(" "))

['a', 'b', 'c']

In [8]:
labels = ("a b c".split(" "))  # labels = ['a', 'b', 'c']
mydata = [10,20,30]

In [9]:
labels

['a', 'b', 'c']

In [10]:
mydata

[10, 20, 30]

In [11]:
s1 = pd.Series(data=mydata, index=labels)
print(s1)

a    10
b    20
c    30
dtype: int64


In [12]:
s1 = pd.Series(data=labels, index=mydata)
print(s1)

10    a
20    b
30    c
dtype: object


In [13]:
pd.Series([10,20,30])

0    10
1    20
2    30
dtype: int64

In [14]:
pd.Series((10,20,30))

0    10
1    20
2    30
dtype: int64

In [15]:
d1  = {'a' :10,
      'b':20,
      'c':30}

In [16]:
d1

{'a': 10, 'b': 20, 'c': 30}

In [17]:
s1 = pd.Series(d1)
s1

a    10
b    20
c    30
dtype: int64

In [18]:
d2 = {'a' :[10,20],
      'b':[20,40],
      'c':[30,50]}

In [19]:
d2

{'a': [10, 20], 'b': [20, 40], 'c': [30, 50]}

In [20]:
s2 = pd.Series(d2)
s2

a    [10, 20]
b    [20, 40]
c    [30, 50]
dtype: object

In [21]:
# Dataframes :-two-dimensional labeled array that can hold multiple Series, where each Series can have a different data type
# pd.DataFrame(data,index,columns)


In [22]:
pd.DataFrame(data=[1,2,3],index=["a","b","c"],columns=["Col1"])

Unnamed: 0,Col1
a,1
b,2
c,3


In [23]:
d2

{'a': [10, 20], 'b': [20, 40], 'c': [30, 50]}

In [24]:
pd.DataFrame(d2)

Unnamed: 0,a,b,c
0,10,20,30
1,20,40,50


In [25]:
pd.DataFrame(d2,index=["R1","R2"])

Unnamed: 0,a,b,c
R1,10,20,30
R2,20,40,50


In [26]:
data = {
    'Name': ['Sangita', 'Rohan', 'Max'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

In [27]:
# Create DataFrame
df = pd.DataFrame(data,index=["a","b","c"])

In [28]:
df

Unnamed: 0,Name,Age,Gender
a,Sangita,25,Female
b,Rohan,30,Male
c,Max,35,Male


In [29]:
data = {
    'Name': ['Sangita', 'Rohan', 'Max'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male']
}

In [30]:
data

{'Name': ['Sangita', 'Rohan', 'Max'],
 'Age': [25, 30, 35],
 'Gender': ['Female', 'Male']}

In [31]:
# Create DataFrame
df = pd.DataFrame(data,index=["a","b","c"])
df

ValueError: could not broadcast input array from shape (2,) into shape (3,)

In [32]:
df = pd.DataFrame(data)
df

ValueError: All arrays must be of the same length