# Loading Files using Pandas
---

#### *In this notebook we have:*
1. Loading different types of files:
    * csv files
    * json files
    * excel files
    * tsv files
    * table from webpage 
---

## Let's import necessary libraries

In [6]:
import pandas as pd

## 1. Loading csv files:
* csv stands for **comma seperated values**.
* load them using `read_csv("path")` function of pandas.
* pass the path of the csv file as a argument to the read_csv("path") function.
* read_csv("path") returns a DataFrame object.

In [7]:
#Get the path of your file
import pandas as pd
path_csv = r"..\datasets\sales1.csv" 
df = pd.read_csv(path_csv)

print(df)

     Order ID Customer Name                 Product  Quantity
0      166837         Veeru  34in Ultrawide Monitor         2
1      166838         Tarun             Samsung m10         3
2      166839         Kedar            20in Monitor         1
3      166840       Lavanya               iPhone 11         3
4      166841          Venu      Macbook Pro Laptop         2
..        ...           ...                     ...       ...
595    167403        Balaji      Macbook Pro Laptop         1
596    167404       Lavanya         ThinkPad Laptop         1
597    167405          Venu           Flatscreen TV         1
598    167406        Siddhu             Samsung m20         2
599    167407         Tarun      LG Washing Machine         1

[600 rows x 4 columns]


---
## 2. Loading json files:
* json stands for **java script object notation**
* stores data as a key-values pair.
* Load it by using `read_json("path")` function of pandas.
* read_json("path") returns a pandas DataFrame object.

In [8]:
path_json = r"..\datasets\sales1.json" 
df = pd.read_json(path_json)

print(df)

     order_id     cust_name                product  quantity
0    16278939       Lavanya        ThinkPad Laptop         2
1    16278966         Kedar          Flatscreen TV         1
2    16278993  Jaya Chandra     Macbook Pro Laptop         2
3    16279020   Mallikarjun              iPhone 11         3
4    16279047        Shahid     LG Washing Machine         1
..        ...           ...                    ...       ...
495  16292304         Sagar               iPhone 7         1
496  16292331    Chaithanya            Samsung m20         1
497  16292358        Siddhu               LG Dryer         2
498  16292385        Siddhu  AA Batteries (4-pack)         1
499  16292412         Sagar              iPhone 11         2

[500 rows x 4 columns]


---
## 3. Loading Excel Files:
* `read_excel("path")` function is used
* Pre-requisites: have xlrd and openpyxl installed prior.


In [9]:
! pip install xlrd
! pip install openpyxl




[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
path_xlsx = r"..\datasets\sales1.xlsx" 
df = pd.read_excel(path_xlsx)

print(df)

     Order ID Customer Name                 Product  Quantity
0      166837         Veeru  34in Ultrawide Monitor         2
1      166838         Tarun             Samsung m10         3
2      166839         Kedar            20in Monitor         1
3      166840       Lavanya               iPhone 11         3
4      166841          Venu      Macbook Pro Laptop         2
..        ...           ...                     ...       ...
595    167403        Balaji      Macbook Pro Laptop         1
596    167404       Lavanya         ThinkPad Laptop         1
597    167405          Venu           Flatscreen TV         1
598    167406        Siddhu             Samsung m20         2
599    167407         Tarun      LG Washing Machine         1

[600 rows x 4 columns]


---
## 4. Loading tsv files:
* tsv stands for tab seperated values.
* `read_table("path")` function is used.

In [11]:
path_tsv = r"..\datasets\sales1.tsv" 
df = pd.read_table(path_tsv)

print(df)

     Order ID Customer Name                 Product  Quantity
0      166837         Veeru  34in Ultrawide Monitor         2
1      166838         Tarun             Samsung m10         3
2      166839         Kedar            20in Monitor         1
3      166840       Lavanya               iPhone 11         3
4      166841          Venu      Macbook Pro Laptop         2
..        ...           ...                     ...       ...
595    167403        Balaji      Macbook Pro Laptop         1
596    167404       Lavanya         ThinkPad Laptop         1
597    167405          Venu           Flatscreen TV         1
598    167406        Siddhu             Samsung m20         2
599    167407         Tarun      LG Washing Machine         1

[600 rows x 4 columns]


---
## 5. Loading Table from webpages:
* `read_html("url")` function is used.
* It takes url of the webpage as an argument and returns a DataFrame.
* Pre-requisites : have lxml

In [12]:
! pip install lxml




[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [13]:
url = 'https://en.wikipedia.org/wiki/The_World%27s_Billionaires' 
df = pd.read_html(url)

print(df[2])

   No.                      Name Net worth (USD)  Age  \
0    1  Bernard Arnault & family    $233 billion   75   
1    2                 Elon Musk    $195 billion   52   
2    3                Jeff Bezos    $194 billion   60   
3    4           Mark Zuckerberg    $177 billion   39   
4    5             Larry Ellison    $141 billion   79   
5    6            Warren Buffett    $133 billion   93   
6    7                Bill Gates    $128 billion   68   
7    8             Steve Ballmer    $121 billion   68   
8    9             Mukesh Ambani    $116 billion   66   
9   10                Larry Page    $114 billion   51   

                           Nationality Primary source(s) of wealth  
0                               France                        LVMH  
1  South Africa  Canada  United States               Tesla, SpaceX  
2                        United States                      Amazon  
3                        United States              Meta Platforms  
4                        Un

---