## Loading and Saving Data (CSV, JSON, APIs, etc)

- Most real-world data exists as files: CSV, Excel, JSON, or databases.
- Every project starts with **loading** data and often ends with **saving** results.

### Load and Save Data from CSV Files

In [143]:
import pandas as pd

In [144]:
%pwd

'/home/as/practice/Python-for-Machine-Learning/Datasets'

In [145]:
cd ..

/home/as/practice/Python-for-Machine-Learning


In [146]:
# Load the dataset
df = pd.read_csv('Datasets/airbnb_listings.csv')

In [147]:
df.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2992450,https://www.airbnb.com/rooms/2992450,20250609011619,2025-06-09,city scrape,Luxury 2 bedroom apartment,The apartment is located in a quiet neighborho...,,https://a0.muscache.com/pictures/44627226/0e72...,4621559,...,4.56,3.22,3.67,,f,1,1,0,0,0.07
1,3820211,https://www.airbnb.com/rooms/3820211,20250609011619,2025-06-09,city scrape,Restored Precinct in Center Sq. w/Parking,"Cozy, cool little 1BR Apt in the heart Albany'...","Great restaurants, architecture, walking, peop...",https://a0.muscache.com/pictures/prohost-api/H...,19648678,...,4.81,4.83,4.78,,f,4,4,0,0,2.34


In [148]:
# Saving Data to CSV Files
SaveData = df.to_csv('Datasets/airbnb_listings_copy.csv', index=False)

Note: 
- Use `pd.read_csv('file_path')` to load CSV files.
- Use `df.to_csv('file_path', index=False)` to save DataFrames to CSV.
- `index=False` prevents saving the DataFrame index as a column.
- Use CSV for big data if possible becasuse Excel(.xlxs) files are heavier. 

### Loading JSON data from an API 

In [149]:
import requests

# API endpoint
url = 'https://jsonplaceholder.typicode.com/users'

# Get JSON data from API
response = requests.get(url)
data = response.json()

print(type(data))

<class 'list'>


In [150]:
# Load JSON into Pandas DataFrame
df = pd.DataFrame(data)

# Print the first few rows
print(df.head(2))

   id           name   username              email  \
0   1  Leanne Graham       Bret  Sincere@april.biz   
1   2   Ervin Howell  Antonette  Shanna@melissa.tv   

                                             address                  phone  \
0  {'street': 'Kulas Light', 'suite': 'Apt. 556',...  1-770-736-8031 x56442   
1  {'street': 'Victor Plains', 'suite': 'Suite 87...    010-692-6593 x09125   

         website                                            company  
0  hildegard.org  {'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...  
1  anastasia.net  {'name': 'Deckow-Crist', 'catchPhrase': 'Proac...  


In [151]:
cd Datasets

/home/as/practice/Python-for-Machine-Learning/Datasets


In [152]:
#  Save the DataFrame into a JSON file
df.to_json('users.json', orient='records', indent=4, lines=True)

Note : 

- Use `requests` to fetch JSON data from APIs.
- Use `response.json()` to convert the response to a Python object (list or dict).

- Load it into a Pandas DataFrame with `pd.DataFrame(data)`.

- Save it to a JSON file with `df.to_json('file.json', orient='records', indent=4, lines=True)`.
    - `orient='records'` makes each row a separate JSON object, which is useful for APIs.
    - `lines=True` writes each JSON object on a new line, which is useful for large files or streaming.
    - `indent=4` makes the JSON file more readable by adding indentation.

- Use `pd.read_json('file.json')` to load the JSON file back into a DataFrame.

### Loading Data from Google Drive

In [153]:
%pip install gdown

9628.47s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Note: you may need to restart the kernel to use updated packages.


In [154]:
import gdown

In [155]:
# Note: The URL should be a public shareable link from Google Drive
url =  'https://drive.google.com/file/d/1p3-HMfa13rjQRKFUD5kGH92gzimt3yyV/view?usp=sharing'

print(type(url))

<class 'str'>


In [156]:
#split the URL to get the file ID
url.split("/") 

['https:',
 '',
 'drive.google.com',
 'file',
 'd',
 '1p3-HMfa13rjQRKFUD5kGH92gzimt3yyV',
 'view?usp=sharing']

In [157]:
# Function to download a file from Google Drive using gdown

def download_from_gdrive(url: str, output: str = "pandas.zip"):
    drive_id = url.split('/')[-2] # Extract the file ID from the URL
    prefix = 'https://drive.google.com/uc?/export=download&id='
    gdown.download(prefix + drive_id, output)

In [158]:
download_from_gdrive(url)

Downloading...
From: https://drive.google.com/uc?/export=download&id=1p3-HMfa13rjQRKFUD5kGH92gzimt3yyV
To: /home/as/practice/Python-for-Machine-Learning/Datasets/pandas.zip
100%|██████████| 34.0k/34.0k [00:00<00:00, 2.05MB/s]
