Why Use Pandas?

• Pandas allows us to analyze big data and make conclusions
based on statistical theories.

• Pandas can clean messy data sets and make them readable and
relevant.

• Relevant data is very important in data science.

• Pandas is well-suited for working with tabular data, such as
spreadsheets or SQL tables.


Create DataFrame
• A pandas DataFrame can be created using various inputs like −
• Lists
• Dictionaryl
• Series
• Numpy ndarrays
• Another DataFrame
• External input iles like CSV, JSON, HTML, Excel sheet, and more

In [39]:
import pandas as pd

data = {
    "calories": [420,380,390],
    "duration": [50,40,45]
}
#load data into a Dataframe object:
df = pd.DataFrame(data)
print(df)

   calories  duration
0       420        50
1       380        40
2       390        45


Create an Empty 
DataFrame

• A basic DataFrame, 
which can be created is 
an Empty Dataframe.


In [40]:
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [41]:
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


In [42]:
data = [['Alex',10],['Bob',12],['clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

     Name  Age
0    Alex   10
1     Bob   12
2  clarke   13


In [47]:
data= [['Alex',30],['Ranjan',80],['Kunal',70]]
dfe = pd.DataFrame(data,columns=["Name","Age"])
print(dfe)

     Name  Age
0    Alex   30
1  Ranjan   80
2   Kunal   70


Named 
Indexes
• With the index argument, 
you can name your own 
indexes.

In [4]:
import pandas as pd
data = {
    "calories": [420,380,390],
    "duration": [50,40,45]
}
df = pd.DataFrame(data,index=["day1",'day2','day3'])
print(df)

      calories  duration
day1       420        50
day2       380        40
day3       390        45


Create a 
DataFrame
from Python 
dictionary

In [None]:
import pandas as pd
data = {'Name' :['Tom','Jack','Steve','Ricky'],
        'Age' :[28,34,12,18]}
df = pd.DataFrame(data,index=['rank1','rank2','rank3','rank4'])

print(df)

        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   12
rank4  Ricky   18


DataFrame 
from List of 
Dicts

In [None]:
import pandas as pd

data = [{'a':1,'b':2},{'a':5,'b':10,'c':20}] #Note − Observe, NaN (Not a Number) is appended in missing areas.
df = pd.DataFrame(data)

print(df)

   a   b     c
0  1   2   NaN
1  5  10  20.0


Load Files Into a DataFrame
• If your data sets are stored in a file, Pandas can load them into a
DataFrame.
• If you have a large DataFrame with many rows, Pandas will only
return the first 5 rows, and the last 5 rows:





In [3]:
import pandas as pd

df = pd.read_csv('data (1).csv')
print(df)

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
..        ...    ...       ...       ...
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4

[169 rows x 4 columns]


In [4]:
print(df.to_string())

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
5          60    102       127     300.0
6          60    110       136     374.0
7          45    104       134     253.3
8          30    109       133     195.1
9          60     98       124     269.0
10         60    103       147     329.3
11         60    100       120     250.7
12         60    106       128     345.3
13         60    104       132     379.3
14         60     98       123     275.0
15         60     98       120     215.2
16         60    100       120     300.0
17         45     90       112       NaN
18         60    103       123     323.0
19         45     97       125     243.0
20         60    108       131     364.2
21         45    100       119     282.0
22         60    130       101     300.0
23         45   

In [9]:
print(df.info())
print(df.columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  169 non-null    int64  
 1   Pulse     169 non-null    int64  
 2   Maxpulse  169 non-null    int64  
 3   Calories  164 non-null    float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None
Index(['Duration', 'Pulse', 'Maxpulse', 'Calories'], dtype='object')


In [18]:
df = pd.read_csv('data (1).csv')
df['Calories'].replace(282.4,1.0,inplace=True)

print(df)

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175       1.0
4          45    117       148     406.0
..        ...    ...       ...       ...
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4

[169 rows x 4 columns]


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Calories'].replace(282.4,1.0,inplace=True)


In [None]:
import pandas as pd

try:
    # Read the CSV file into a DataFrame
    df = pd.read_csv('data (1).csv')
    
    # Convert the 'Calories' column to Int64, handling errors
    if 'Calories' in df.columns:
        df['Calories'] = pd.to_numeric(df['Calories'], errors='coerce').astype('Int64')
        print(df['Calories'])
    else:
        print("The 'Calories' column is not found in the file.")
        
except FileNotFoundError:
    print("The file 'data (1).csv' was not found. Please check the path.")
except Exception as e:
    print(f"An error occurred: {e}")







An error occurred: cannot safely cast non-equivalent object to int64


Sample Questions


• Implement a Python script that utilizes the pandas library to read
data from a Comma Separated Values (CSV) file and display its
contents.


• Given a pandas DataFrame comprising columns 'A', 'B', and 'C'
with respective values [10, 20, 30], [40, 50, 60], and [70, 80, 90],
develop a Python code snippet to calculate the sum of the values
in column 'B'

In [None]:
#Q.1-----------------?
import pandas as pd

df = pd.read_csv('data (1).csv')

print(df)

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
..        ...    ...       ...       ...
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4

[169 rows x 4 columns]


In [None]:
#Q.2---------------?

import pandas as pd

data = {
    'A':[10,20,30],
    'B':[40,50,60],
    'C':[70,80,90]
}


df = pd.DataFrame(data)

sum_B = df['B'].sum()
print(sum_B)


150
