# Checking Pandas Version
The version string is stored under __version__ attribute.

In [33]:
import pandas as pd

print(pd.__version__)

2.3.3


# Pandas Series
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.

In [34]:
import pandas as pd

a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

0    1
1    7
2    2
dtype: int64


The describe function provides statistical insights into our data, we used to
use different function in numpy to obtain such inofrmation, but with pandas
employing describe alone is sufficient.

we have the flexibility to tailor the information obtained from describe using the
agg command based on our specific requirements.

In [35]:
data=pd.Series([1,2,3,4,5,6,7,8,9,10])
print(data.describe())

count    10.00000
mean      5.50000
std       3.02765
min       1.00000
25%       3.25000
50%       5.50000
75%       7.75000
max      10.00000
dtype: float64


In [36]:
print(data.agg(['max','min','sum','mean','std','var','count']))

max      10.000000
min       1.000000
sum      55.000000
mean      5.500000
std       3.027650
var       9.166667
count    10.000000
dtype: float64


In [38]:
print(data.info())

<class 'pandas.core.series.Series'>
RangeIndex: 10 entries, 0 to 9
Series name: None
Non-Null Count  Dtype
--------------  -----
10 non-null     int64
dtypes: int64(1)
memory usage: 212.0 bytes
None


# Accessing data

## Labels
If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

In [None]:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
# return the first value of series
print(myvar[0])

0    1
1    7
2    2
dtype: int64
1


With the index argument, you can name your own labels.

In [None]:
import pandas as pd

a=[1,2,3,4,5]
Series=pd.Series(a,index=['a','b','c','d','e'])
print(Series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [None]:
print(Series['a'])

1


we can use key:value pair (dictionary) to create series

In [None]:
import pandas as pd

A={"1st month":"jan",
   "2nd month":"feb",
   "3rd month":"march",
   "4th month":"apr"}
myvar=pd.Series(A)
print(myvar)

1st month      jan
2nd month      feb
3rd month    march
4th month      apr
dtype: object


we can specify which element of dictionary we want in series


In [None]:
myvar2=pd.Series(A,index=["1st month","2nd month"])
print(myvar2)

1st month    jan
2nd month    feb
dtype: object


#### Slicing index

In [None]:
import pandas as pd

A_series=pd.Series([1.23,2.34,3.45,4.56,5.67])
print("using slicing to access specific data\n", A_series[2:])
print("using steps to accsses specific data\n", A_series[0::2])

using slicing to access specific data
 2    3.45
3    4.56
4    5.67
dtype: float64
using steps to accsses specific data
 0    1.23
2    3.45
4    5.67
dtype: float64


# Data Frames(Tables)

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Series is like a column, a DataFrame is the whole table.

### Creating DataFrame

There are various ways to create Data frames (DF) in pandas, including using
arrays and series.

We will explore each method in this notebook, but you will notice that the DF
class requires several optional parameters, such as index, column names, and
others.

However, one parameter is mandatoryâ€” the data itself (either an array or a
series)

#### Creating Dataframe from Array

It's essential to note that the length of the index list should match the number of rows
in the data, and similarly, the length of the columns list should correspond to the
number of columns in the provided data.

In [None]:
import pandas as pd
import numpy as np

#actual data
Data=np.array([[1,2,3,4],
               [5,6,7,8],
               [9,10,11,12],
               [13,14,15,16]])

#Naming rows
Row=np.array(["Row1","Row2","Row3","Row4"])
#Naming columns
Column=np.array(["Col1","Col2","Col3","Col4"])
#Creating dataframe
DataFrame=pd.DataFrame(Data, index=Row, columns=Column)

print(DataFrame)

      Col1  Col2  Col3  Col4
Row1     1     2     3     4
Row2     5     6     7     8
Row3     9    10    11    12
Row4    13    14    15    16


#### Creating Dataframe from list

In [None]:
import pandas as pd

Data=[["Sumit pandey",20,"Hotel Manager"],
      ["Shubham Kanojiya",24,"Clinic owner"],
      ["Akshy Maurya",25,"Production Manger"],
      ["Ayush Kharawar",30,"Police officer"],
      ["Sarvesh Maurya",29,"ML Engeener"]]

#define column name 
Column_name=["Name","age","Proffesion"]
#creating data frame
DataFrame=pd.DataFrame(Data, columns= Column_name)

print(DataFrame)

               Name  age         Proffesion
0      Sumit pandey   20      Hotel Manager
1  Shubham Kanojiya   24       Clinic owner
2      Akshy Maurya   25  Production Manger
3    Ayush Kharawar   30     Police officer
4    Sarvesh Maurya   29        ML Engeener


#### creating Dataframe from Series

As previously mentioned, a Series is essentially a 1D matrix. If you have multiple
Series, you can combine them to create a DataFrame.

In [None]:
import pandas as pd

w=pd.Series({'A':1,'B':2,'C':3,'D':4})
x=pd.Series({'A':5,'B':6,'C':7,'D':8})
y=pd.Series({'A':9,'B':10,'C':11,'D':12})
z=pd.Series({'A':13,'B':14,'C':15,'D':16})

df=pd.DataFrame({'a':w,'b':x,'c':y,'d':z})
print(df)

   a  b   c   d
A  1  5   9  13
B  2  6  10  14
C  3  7  11  15
D  4  8  12  16


#### creating dataframe from dictionary

We can create DF by dictionary methods column by column and using the conditions
for fill the values

In [None]:
import pandas as pd

Data=[{'Square':i**2,'Cube':i**3,"Square root":i**0.5} for i in range(20)]
df=pd.DataFrame(Data)

print(df.to_string())

    Square  Cube  Square root
0        0     0     0.000000
1        1     1     1.000000
2        4     8     1.414214
3        9    27     1.732051
4       16    64     2.000000
5       25   125     2.236068
6       36   216     2.449490
7       49   343     2.645751
8       64   512     2.828427
9       81   729     3.000000
10     100  1000     3.162278
11     121  1331     3.316625
12     144  1728     3.464102
13     169  2197     3.605551
14     196  2744     3.741657
15     225  3375     3.872983
16     256  4096     4.000000
17     289  4913     4.123106
18     324  5832     4.242641
19     361  6859     4.358899


# operations on Dataframe

#### Transpose th DF

In [None]:
print(grades.T)

            a   b   c   d   e
Math        1   2   3   4   5
Physics     6   7   8   9  10
French     11  12  13  14  15
Chemistry  16  17  18  19  20


#### Accessing only keys or values of DF

Note : Keys --> Columns header

In [None]:
print(grades.keys(),"\n")
print(grades.values)

Index(['Math', 'Physics', 'French', 'Chemistry'], dtype='object') 

[[ 1  6 11 16]
 [ 2  7 12 17]
 [ 3  8 13 18]
 [ 4  9 14 19]
 [ 5 10 15 20]]


#### applying conditions

In [None]:
print("math" in grades)
print("Math" in grades)

False
True


####  Vertical representation of elements with keys and values

In [None]:
print(grades.stack())

a  Math          1
   Physics       6
   French       11
   Chemistry    16
b  Math          2
   Physics       7
   French       12
   Chemistry    17
c  Math          3
   Physics       8
   French       13
   Chemistry    18
d  Math          4
   Physics       9
   French       14
   Chemistry    19
e  Math          5
   Physics      10
   French       15
   Chemistry    20
dtype: int64


####  Locating specific elements for slicing and searching within the DF
We have two primary methods for this task:

Method 1: iloc (i for index) - It locates the position by index, similar to the method we
are familiar with in lists.

Example: --> iloc[:3, :2]

Method 2: loc - You need to specify the names of the elements, rows, and columns
you are searching for.

Example: --> loc["b":"c", "Math":] This implies selecting rows from 'b' to 'c' and
columns from 'Math' to the end.

Note: In this method, you need to reference columns by their names. If the index is
numeric, you can use numbers.

Example: df.loc[3:6, : "Square of x"]

In [None]:
import pandas as pd

w = pd.Series({'a':1 ,'b':2 ,'c':3 ,'d':4 ,'e':5})
x = pd.Series({'a':6 ,'b':7 ,'c':8 ,'d':9 ,'e':10})
y = pd.Series({'a':11 ,'b':12 ,'c':13 ,'d':14 ,'e':15})
z = pd.Series({'a':16 ,'b':17 ,'c':18 ,'d':19 ,'e':20})

grades = pd.DataFrame({'Math':w,'Physics':x,'French':y,'Chemistry':z})
print(grades.iloc[:4,:2])

   Math  Physics
a     1        6
b     2        7
c     3        8
d     4        9


In [None]:
print(grades.loc["a":,"Physics":],"\n")
print(grades.loc["a":"c"],"\n")
print(grades.loc[grades.Math>2])

   Physics  French  Chemistry
a        6      11         16
b        7      12         17
c        8      13         18
d        9      14         19
e       10      15         20 

   Math  Physics  French  Chemistry
a     1        6      11         16
b     2        7      12         17
c     3        8      13         18 

   Math  Physics  French  Chemistry
c     3        8      13         18
d     4        9      14         19
e     5       10      15         20


In [None]:
#slecting marks of physics and french where math have greater than 2 score
print(grades.loc[grades.Math>2,['Physics','French']])

   Physics  French
c        8      13
d        9      14
e       10      15


In [None]:
#Access names of all columns and the index in the DF
print(grades.index)
print(grades.columns)

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Index(['Math', 'Physics', 'French', 'Chemistry'], dtype='object')


## sorting

Sorting the DataFrame based on the values of a specific column in ascending or
descending order (using the parameter ascending=False if needed).

In [None]:
print(grades.sort_values(['Math'],ascending=False),"\n")
print(grades.sort_values(['Math'],ascending=True))


   Math  Physics  French  Chemistry
e     5       10      15         20
d     4        9      14         19
c     3        8      13         18
b     2        7      12         17
a     1        6      11         16 

   Math  Physics  French  Chemistry
a     1        6      11         16
b     2        7      12         17
c     3        8      13         18
d     4        9      14         19
e     5       10      15         20


## Statistics for the entire DF or per column

In [45]:
import pandas as pd

w = pd.Series({'a':1 ,'b':2 ,'c':3 ,'d':4 ,'e':5})
x = pd.Series({'a':6 ,'b':7 ,'c':8 ,'d':9 ,'e':10})
y = pd.Series({'a':11 ,'b':12 ,'c':13 ,'d':14 ,'e':15})
z = pd.Series({'a':16 ,'b':17 ,'c':18 ,'d':19 ,'e':20})

grades = pd.DataFrame({'Math':w,'Physics':x,'French':y,'Chemistry':z})

#applying max in all the colmns in table
print("appying max in entire table\n",grades.max(),"\n")

#applying min in all the colmns in table
print("appying min in entire table\n",grades.min(),"\n")


appying max in entire table
 Math          5
Physics      10
French       15
Chemistry    20
dtype: int64 

appying min in entire table
 Math          1
Physics       6
French       11
Chemistry    16
dtype: int64 



In [48]:
print("max in only one column\n",grades["Math"].max())
print("min in only one column\n",grades["Math"].min())
print("mean in only one column\n",grades["Math"].mean())


max in only one column
 5
min in only one column
 1
mean in only one column
 3.0


### The correlation between elements in the DF

When the numbers are closely aligned, the correlation tends to approach 1, and
conversely, when they are distant, the correlation tends to be closer to -1.

In [49]:
print(grades.corr())

           Math  Physics  French  Chemistry
Math        1.0      1.0     1.0        1.0
Physics     1.0      1.0     1.0        1.0
French      1.0      1.0     1.0        1.0
Chemistry   1.0      1.0     1.0        1.0


In [57]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])
print(df)
print(df.corr())

          A         B         C
0  0.072329  0.787448  0.076466
1  0.193750  0.078668  0.345043
2  0.985219  0.019072  0.512578
3  0.126944  0.676123  0.612448
4  0.984668  0.408020  0.685040
          A         B         C
A  1.000000 -0.544060  0.602718
B -0.544060  1.000000 -0.263556
C  0.602718 -0.263556  1.000000


### The skewness among the elements of each column in the DF

The df.skew() function yields a Series, providing one value for each column in the
DataFrame. These Series values represent the skewness of each corresponding
column.

In [58]:
print(grades.skew())

Math         0.0
Physics      0.0
French       0.0
Chemistry    0.0
dtype: float64


In [74]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])
print(df,"\n")
print(df.skew())

          A         B         C
0  0.690470  0.819168  0.786810
1  0.653758  0.145844  0.006450
2  0.834961  0.165620  0.082276
3  0.313578  0.386242  0.144225
4  0.465389  0.678836  0.524066 

A   -0.388865
B    0.344115
C    0.849054
dtype: float64
