### What is Pandas?
* Pandas is a Python library used to work with data easily.It's super useful when you’re working with tables, spreadsheets, CSV files, or any kind of structured data.

### What is it used for?
* Reading data (from CSV, Excel, SQL, JSON, etc.)

* Cleaning data (handling missing values, fixing columns)

* Analyzing data (filtering, grouping, summarizing)

* Visualizing simple data trends (with help of other libraries like Matplotlib)

* Manipulating data (adding/deleting columns, changing values)

### Pandas Library – Intermediate Overview
#### 1. DataFrame and Series (Recap)
* A Series is like a single column (1D).

* A DataFrame is like a full Excel sheet with rows and columns (2D).

#### 2. Indexing
Pandas lets you control row and column indexing:

* You can rename the index or set a specific column as the index.

* There are two main ways to access data:

* Label-based: using .loc (e.g., by row name)

* Position-based: using .iloc (e.g., by row number)

#### 3. Data Cleaning
Real-world data is often messy. Pandas helps with:

* Handling missing values

* Fixing data types (like string vs number)

* Removing duplicates

* Replacing or renaming values or columns

#### 4. Aggregation and Grouping
 Pandas allows you to group data based on a column and then perform calculations like:

* Sum, average, count, min, max
* This is helpful in getting insights from categories (e.g., sales by region).

#### 5. Merging and Joining
You can combine multiple DataFrames using:

* Merge (like SQL joins)

* Concatenation (stacking data vertically or horizontally)

* This is useful when working with data from different sources.

#### 6. Data Transformation
Pandas allows you to:

* Add or remove columns

* Apply functions to rows or columns

* Reorder or sort data

* Convert between data types

You can also reshape your data using methods like:

* pivot or pivot_table (turn columns into rows or vice versa)

* melt (make wide data long)

#### 7. Time Series Support
Pandas is very good for handling time-based data:

* Date/time parsing

* Resampling (e.g., from daily to monthly)

* Rolling averages and trends over time

#### 8. Working with Files
Pandas supports:

* Reading/writing CSV, Excel, SQL, JSON, Parquet, and more

* Useful for loading large datasets from disk or cloud



In [11]:
! pip install pandas



In [12]:
import pandas as pd 

In [13]:
data = [1, 2, 3, 4, 5]
print("series\n" , pd.Series(data))

series
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [14]:
pd.DataFrame(data)

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [15]:
## Create a series from dictionary 

dic = {
    "name":["pratik","Pushkar" , "Akshay" , "Vedant" , "Mayur" , "Jayesh"],
    "age" : [21,21,21,21,21,21],
    "City" : ["Pune","Pune","Pune","Pune","Pune","Pune"],
    "Address" : ["Kamgarnagar","Kamgarnagar","Kamgarnagar","nigdi","sara","sara"],
    "Sem" : ["VII","VII","VII","VII","VII","VII"]
}

print(pd.DataFrame(dic))

      name  age  City      Address  Sem
0   pratik   21  Pune  Kamgarnagar  VII
1  Pushkar   21  Pune  Kamgarnagar  VII
2   Akshay   21  Pune  Kamgarnagar  VII
3   Vedant   21  Pune        nigdi  VII
4    Mayur   21  Pune         sara  VII
5   Jayesh   21  Pune         sara  VII


In [16]:
df = pd.DataFrame(dic)
df

Unnamed: 0,name,age,City,Address,Sem
0,pratik,21,Pune,Kamgarnagar,VII
1,Pushkar,21,Pune,Kamgarnagar,VII
2,Akshay,21,Pune,Kamgarnagar,VII
3,Vedant,21,Pune,nigdi,VII
4,Mayur,21,Pune,sara,VII
5,Jayesh,21,Pune,sara,VII


In [17]:
import pandas as pd 

In [18]:
dic1 = {
    "student1" : {"name" : "Pratik" , "age" : 21 , "city" : "Pune"},
    "student2" : {"name" : "Pushkar" , "age" : 21 , "city" : "Pune"},
    "student3" : {"name" : "Akshay" , "age" : 21 , "city" : "Pune"}
}

print(pd.Series(dic1))

student1     {'name': 'Pratik', 'age': 21, 'city': 'Pune'}
student2    {'name': 'Pushkar', 'age': 21, 'city': 'Pune'}
student3     {'name': 'Akshay', 'age': 21, 'city': 'Pune'}
dtype: object


In [19]:
df = pd.DataFrame(dic1)
df

Unnamed: 0,student1,student2,student3
name,Pratik,Pushkar,Akshay
age,21,21,21
city,Pune,Pune,Pune


In [20]:
import numpy as np

np.array(df)


array([['Pratik', 'Pushkar', 'Akshay'],
       [21, 21, 21],
       ['Pune', 'Pune', 'Pune']], dtype=object)

In [21]:
df = pd.read_csv(r"D:\python\1-python_basics\Sales_Data.csv")

In [22]:
df.head(10)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Asia,Credit Card
6,10007,2024-01-07,Electronics,MacBook Pro 16-inch,1,2499.99,2499.99,North America,Credit Card
7,10008,2024-01-08,Home Appliances,Blueair Classic 480i,2,599.99,1199.98,Europe,PayPal
8,10009,2024-01-09,Clothing,Nike Air Force 1,6,89.99,539.94,Asia,Debit Card
9,10010,2024-01-10,Books,Dune by Frank Herbert,2,25.99,51.98,North America,Credit Card


In [23]:
df.tail()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [24]:
df.count()

Transaction ID      240
Date                240
Product Category    240
Product Name        240
Units Sold          240
Unit Price          240
Total Revenue       240
Region              240
Payment Method      240
dtype: int64

In [25]:
df.columns

Index(['Transaction ID', 'Date', 'Product Category', 'Product Name',
       'Units Sold', 'Unit Price', 'Total Revenue', 'Region',
       'Payment Method'],
      dtype='object')

In [26]:
df["Product Category"]

0          Electronics
1      Home Appliances
2             Clothing
3                Books
4      Beauty Products
            ...       
235    Home Appliances
236           Clothing
237              Books
238    Beauty Products
239             Sports
Name: Product Category, Length: 240, dtype: object

In [27]:
df["Payment Method"]

0      Credit Card
1           PayPal
2       Debit Card
3      Credit Card
4           PayPal
          ...     
235         PayPal
236     Debit Card
237    Credit Card
238         PayPal
239    Credit Card
Name: Payment Method, Length: 240, dtype: object

In [28]:
type(df["Payment Method"])

pandas.core.series.Series

### Using loc[ ] (Label-Based)
```python
df.loc['a']           # Row with label 'a'
df.loc['a':'b']       # Rows 'a' and 'b' (inclusive)
df.loc['a', 'Name']   # Value at row 'a' and column 'Name'

```
### Using iloc[ ] (Integer Position-Based)
```python
df.iloc[0]            # First row
df.iloc[0:2]          # First two rows (0 and 1)
df.iloc[0, 1]         # Value at 1st row, 2nd column
```


In [29]:
df.loc[0]["Date"]

'2024-01-01'

In [30]:
df.loc[0]["Product Name"]

'iPhone 14 Pro'

In [31]:
df.iloc[1][7]

  df.iloc[1][7]


'Europe'

In [32]:
df.iloc[0:]

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
...,...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,Europe,PayPal


In [33]:
## accessing elemet using at 

df.at[3,"Region"]

'North America'

In [34]:
## Accessing element using iat 

df.iat[23,8]

'Credit Card'

#### Data manipulation using pandas

In [35]:
df 

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
...,...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,Europe,PayPal


In [42]:
## Remove a column
df.drop('Region', axis = 1)   ### By default row is triggerred i.e axis = 0 ... so we chanege the axis to column i.e axis = 1

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,Credit Card
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Credit Card
...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,PayPal


In [43]:
## for permanent operation 
try :
    df.drop("Region" , axis = 1 , inplace=True)
except :
    print("Column already deleted")

In [44]:
df

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,Credit Card
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Credit Card
...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,PayPal


In [45]:
df["Unit Price"] = df["Unit Price"] + 10

In [46]:
df

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,1009.99,1999.98,Credit Card
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,79.99,209.97,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,25.99,63.96,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,99.99,89.99,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,39.99,149.95,Credit Card
...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,169.99,159.99,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,100.00,270.00,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,20.99,32.97,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,65.00,55.00,PayPal


In [47]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,239.0,239.0,239.0,239.0
mean,10120.995816,2.16318,245.292678,335.011967
std,69.144806,1.323092,430.007202,486.707016
min,10001.0,1.0,16.5,6.5
25%,10061.5,1.0,39.5,61.97
50%,10121.0,2.0,99.99,179.97
75%,10180.5,3.0,259.99,398.5
max,10240.0,10.0,3909.99,3899.99


In [48]:
df.dtypes

Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Payment Method       object
dtype: object

## Part 2 of pandas 


In [90]:
df = pd.read_csv(r"D:\python\1-python_basics\data.csv")

In [91]:
df.head()

Unnamed: 0,Date,Category,Value,Product,Sales,Region
0,2023-01-01,A,28.0,Product1,754.0,East
1,2023-01-02,B,39.0,Product3,110.0,North
2,2023-01-03,C,32.0,Product2,398.0,East
3,2023-01-04,B,8.0,Product1,522.0,East
4,2023-01-05,B,26.0,Product3,869.0,North


In [92]:
df.describe()

Unnamed: 0,Value,Sales
count,47.0,46.0
mean,51.744681,557.130435
std,29.050532,274.598584
min,2.0,108.0
25%,27.5,339.0
50%,54.0,591.5
75%,70.0,767.5
max,99.0,992.0


In [93]:
df.dtypes

Date         object
Category     object
Value       float64
Product      object
Sales       float64
Region       object
dtype: object

In [94]:
df.isnull().any(axis =1 )

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11     True
12    False
13    False
14    False
15     True
16    False
17     True
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28     True
29    False
30    False
31    False
32    False
33     True
34    False
35     True
36    False
37     True
38    False
39    False
40    False
41    False
42    False
43    False
44    False
45    False
46    False
47    False
48    False
49    False
dtype: bool

In [95]:
df.isnull().sum()

Date        0
Category    0
Value       3
Product     0
Sales       4
Region      0
dtype: int64

In [96]:
df_filled = df.fillna(0)

In [97]:
df

Unnamed: 0,Date,Category,Value,Product,Sales,Region
0,2023-01-01,A,28.0,Product1,754.0,East
1,2023-01-02,B,39.0,Product3,110.0,North
2,2023-01-03,C,32.0,Product2,398.0,East
3,2023-01-04,B,8.0,Product1,522.0,East
4,2023-01-05,B,26.0,Product3,869.0,North
5,2023-01-06,B,54.0,Product3,192.0,West
6,2023-01-07,A,16.0,Product1,936.0,East
7,2023-01-08,C,89.0,Product1,488.0,West
8,2023-01-09,C,37.0,Product3,772.0,West
9,2023-01-10,A,22.0,Product2,834.0,West


In [98]:
df["Sales_fillna"] = df["Sales"].fillna(df["Sales"].mean())

In [99]:
df["Sales_fillna"]

0     754.000000
1     110.000000
2     398.000000
3     522.000000
4     869.000000
5     192.000000
6     936.000000
7     488.000000
8     772.000000
9     834.000000
10    842.000000
11    557.130435
12    628.000000
13    423.000000
14    893.000000
15    895.000000
16    511.000000
17    108.000000
18    578.000000
19    736.000000
20    606.000000
21    992.000000
22    942.000000
23    342.000000
24    458.000000
25    584.000000
26    619.000000
27    224.000000
28    617.000000
29    737.000000
30    735.000000
31    189.000000
32    338.000000
33    557.130435
34    669.000000
35    557.130435
36    177.000000
37    557.130435
38    408.000000
39    155.000000
40    578.000000
41    256.000000
42    164.000000
43    949.000000
44    830.000000
45    599.000000
46    938.000000
47    143.000000
48    182.000000
49    708.000000
Name: Sales_fillna, dtype: float64

In [100]:
df.isnull().sum()

Date            0
Category        0
Value           3
Product         0
Sales           4
Region          0
Sales_fillna    0
dtype: int64

In [101]:
df["Sales"].fillna(df["Sales"].mean(), inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Sales"].fillna(df["Sales"].mean(), inplace=True)


In [102]:
df.isnull().sum()

Date            0
Category        0
Value           3
Product         0
Sales           0
Region          0
Sales_fillna    0
dtype: int64

In [103]:
df.columns

Index(['Date', 'Category', 'Value', 'Product', 'Sales', 'Region',
       'Sales_fillna'],
      dtype='object')

In [104]:
# Renaming the columns 

df = df.rename(columns={"Date" : "Sale Date"})

In [105]:
## change datatype 

df["New Val"] = df["Value"].fillna(df["Value"].mean()).astype(int)

In [106]:
df.head()

Unnamed: 0,Sale Date,Category,Value,Product,Sales,Region,Sales_fillna,New Val
0,2023-01-01,A,28.0,Product1,754.0,East,754.0,28
1,2023-01-02,B,39.0,Product3,110.0,North,110.0,39
2,2023-01-03,C,32.0,Product2,398.0,East,398.0,32
3,2023-01-04,B,8.0,Product1,522.0,East,522.0,8
4,2023-01-05,B,26.0,Product3,869.0,North,869.0,26


In [117]:
df["New Val"] = df["New Val"].astype(float)
df.head()

Unnamed: 0,Sale Date,Category,Value,Product,Sales,Region,Sales_fillna,New Val,val new
0,2023-01-01,A,28.0,Product1,754.0,East,754.0,28.0,38.0
1,2023-01-02,B,39.0,Product3,110.0,North,110.0,39.0,49.0
2,2023-01-03,C,32.0,Product2,398.0,East,398.0,32.0,42.0
3,2023-01-04,B,8.0,Product1,522.0,East,522.0,8.0,18.0
4,2023-01-05,B,26.0,Product3,869.0,North,869.0,26.0,36.0


In [119]:
# apply operation by using . Apply("operation to be performed")

df["val new"] = df["Value"].fillna(df["Value"].mean()).apply(lambda x :x + 10 )
df.head()

Unnamed: 0,Sale Date,Category,Value,Product,Sales,Region,Sales_fillna,New Val,val new
0,2023-01-01,A,28.0,Product1,754.0,East,754.0,28.0,38.0
1,2023-01-02,B,39.0,Product3,110.0,North,110.0,39.0,49.0
2,2023-01-03,C,32.0,Product2,398.0,East,398.0,32.0,42.0
3,2023-01-04,B,8.0,Product1,522.0,East,522.0,8.0,18.0
4,2023-01-05,B,26.0,Product3,869.0,North,869.0,26.0,36.0


In [120]:
group_mean = df.groupby("Product")["Value"].sum()
group_mean

Product
Product1    647.0
Product2    792.0
Product3    993.0
Name: Value, dtype: float64

In [121]:
group_mean = df.groupby("Product")["Value"].mean()
group_mean

Product
Product1    46.214286
Product2    52.800000
Product3    55.166667
Name: Value, dtype: float64

In [69]:
grouperd_sum = df.groupby(["Product", "Region"])["Value"].sum()
print(grouperd_sum)

Product   Region
Product1  East      292.0
          North       9.0
          South     100.0
          West      246.0
Product2  East       56.0
          North     127.0
          South     181.0
          West      428.0
Product3  East      202.0
          North     203.0
          South     215.0
          West      373.0
Name: Value, dtype: float64


In [116]:
## Aggregate Multiple Functions 

grouped_agg = df.groupby(["Region"])["Value"].agg(["mean" , "median" , "sum" , "count"])
grouped_agg



Unnamed: 0_level_0,mean,median,sum,count
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
East,42.307692,32.0,550.0,13
North,37.666667,39.0,339.0,9
South,62.0,66.5,496.0,8
West,61.588235,60.0,1047.0,17


In [71]:
df1 = pd.DataFrame({"key" : ["a" , "b", "c"] , "Value1" : [1, 2, 3]})

df2 = pd.DataFrame({"key": ["a","b", "c"] , "Value2" : [4, 5, 6]})

In [72]:
df1

Unnamed: 0,key,Value1
0,a,1
1,b,2
2,c,3


In [73]:
df2

Unnamed: 0,key,Value2
0,a,4
1,b,5
2,c,6


In [74]:
merged_data = pd.merge(df1 , df2 , on = "key" , how = "inner") 

In [75]:
merged_data

Unnamed: 0,key,Value1,Value2
0,a,1,4
1,b,2,5
2,c,3,6


In [76]:
df1 = pd.DataFrame({"key" : ["a" , "b" , "c" , "d"] , "Value1" : [1, 2, 3, 4]})

df2 = pd.DataFrame({"key" : ["a" , "b" , "e" , "f"], "Value2" : [5, 6, 7, 8]})


In [77]:
df1

Unnamed: 0,key,Value1
0,a,1
1,b,2
2,c,3
3,d,4


In [78]:
df2

Unnamed: 0,key,Value2
0,a,5
1,b,6
2,e,7
3,f,8


In [79]:
merged_new = pd.merge(df1 , df2 , on = "key" , how = "inner")
merged_new

Unnamed: 0,key,Value1,Value2
0,a,1,5
1,b,2,6


In [80]:
merged_new = pd.merge(df1 , df2 , on = "key" , how = "left")
merged_new

Unnamed: 0,key,Value1,Value2
0,a,1,5.0
1,b,2,6.0
2,c,3,
3,d,4,


In [81]:
merged_new = pd.merge(df1 , df2 , on = "key" , how = "right")
merged_new

Unnamed: 0,key,Value1,Value2
0,a,1.0,5
1,b,2.0,6
2,e,,7
3,f,,8


In [82]:
merged_new = pd.merge(df1 , df2 , on = "key" , how = "outer")
merged_new

Unnamed: 0,key,Value1,Value2
0,a,1.0,5.0
1,b,2.0,6.0
2,c,3.0,
3,d,4.0,
4,e,,7.0
5,f,,8.0


In [83]:
!pip install lxml
!pip install beautifulsoup4 
!pip install html5lib
!pip install openpyxl




In [84]:
url = "https://www.fdic.gov/resources/resolutions/bank-failures/in-brief/index"

df = pd.read_html(url)
df

[     Years                       Bank Failures Total Assets (Millions)
 0     2009                United Security Bank                  $157.0
 1     2009                     Bank of Wyoming                   $70.0
 2     2010                      Woodlands Bank                  $376.2
 3     2016        The Woodbury Banking Company                   $21.4
 4     2009      First State Bank of Winchester                   $36.0
 ..     ...                                 ...                     ...
 567   2023                      Signature Bank              $110,400.0
 568   2023                 Silicon Valley Bank              $209,000.0
 569   2024  The First National Bank of Lindsay                  $107.8
 570   2025                Pulaski Savings Bank                   $49.5
 571   2025        The Santa Anna National Bank                   $63.8
 
 [572 rows x 3 columns]]

In [85]:
df[0]

Unnamed: 0,Years,Bank Failures,Total Assets (Millions)
0,2009,United Security Bank,$157.0
1,2009,Bank of Wyoming,$70.0
2,2010,Woodlands Bank,$376.2
3,2016,The Woodbury Banking Company,$21.4
4,2009,First State Bank of Winchester,$36.0
...,...,...,...
567,2023,Signature Bank,"$110,400.0"
568,2023,Silicon Valley Bank,"$209,000.0"
569,2024,The First National Bank of Lindsay,$107.8
570,2025,Pulaski Savings Bank,$49.5


In [86]:
url = "https://en.wikipedia.org/wiki/Mobile_country_code"
pd.read_html(url , match = "Country" , header = 0)[0]



Unnamed: 0,Mobile country code,Country,ISO 3166,Mobile network codes,National MNC authority,Remarks
0,289,A Abkhazia,GE-AB,List of mobile network codes in Abkhazia,,MCC is not listed by ITU
1,412,Afghanistan,AF,List of mobile network codes in Afghanistan,,
2,276,Albania,AL,List of mobile network codes in Albania,,
3,603,Algeria,DZ,List of mobile network codes in Algeria,,
4,544,American Samoa (United States of America),AS,List of mobile network codes in American Samoa,,
...,...,...,...,...,...,...
247,452,Vietnam,VN,List of mobile network codes in the Vietnam,,
248,543,W Wallis and Futuna,WF,List of mobile network codes in Wallis and Futuna,,
249,421,Y Yemen,YE,List of mobile network codes in the Yemen,,
250,645,Z Zambia,ZM,List of mobile network codes in Zambia,,


In [87]:
pd.read_excel("data.xlsx")

Unnamed: 0,Name,Age
0,pratik,21
1,akshay,21
2,pushkar,21
3,nital,24
4,bhagwati,18
