
# Day 12 - Data Import and Export with Pandas

Importing and exporting data is a fundamental task in data science, whether working with local files, databases, or fetching data from the web.

Pandas makes handling input and output operations seamless, allowing you to focus on analysis rather than data wrangling. Mastering this workflow enables you to efficiently manipulate datasets, share results, and automate repetitive data-loading tasks in larger projects.
    


## Reading and Writing Data to/from CSV Files

CSV (Comma-Separated Values) is a lightweight, easy-to-read format that is widely used for storing data. Pandas provides robust functions for reading and writing CSV files.

### Reading CSV Files

To load data from a CSV file into a Pandas DataFrame, use the `pd.read_csv()` function:
    

In [1]:

import pandas as pd

# Reading data from a CSV file (sample file will need to be created in the same directory)
df = pd.read_csv('sales_data.csv')

# Displaying the first few rows of the DataFrame
print(df.head())
    

    Product  Quantity  Price  Sales Region
0    Laptop         5   1000   5000  North
1     Mouse        15     20    300   West
2  Keyboard        10     50    500   East
3   Monitor         8    200   1600  South
4    Laptop        12    950  11400  North



### Writing Data to CSV Files

To export a DataFrame back to a CSV file, use the `df.to_csv()` function:
    

In [2]:

# Writing the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

print("Data saved to output.csv")
    

Data saved to output.csv



## Example: Analyzing a Sales Dataset

Let's apply these functions in a real-world scenario where we work with a sales dataset stored in a CSV file. We'll load the dataset, perform basic analysis, and export the results to a new CSV file.

### Step 1: Reading the Sales Data
    

In [3]:

# Reading sales data from a CSV file
sales_df = pd.read_csv('sales_data.csv')

# Displaying the first few rows of the dataset
print("First few rows of the sales data:")
print(sales_df.head())
    

First few rows of the sales data:
    Product  Quantity  Price  Sales Region
0    Laptop         5   1000   5000  North
1     Mouse        15     20    300   West
2  Keyboard        10     50    500   East
3   Monitor         8    200   1600  South
4    Laptop        12    950  11400  North



### Step 2: Performing Basic Analysis

Now that we've loaded the data, let's perform some basic analysis to understand the dataset.

**Summarizing Sales by Region**: We'll calculate the total sales for each region by grouping the data and summing the sales for each group.
    

In [4]:

# Grouping by Region and calculating total sales
sales_by_region = sales_df.groupby('Region')['Sales'].sum()
print("Total sales by region:")
print(sales_by_region)

Total sales by region:
Region
East       770
North    16400
South     3070
West       650
Name: Sales, dtype: int64



**Identifying the Best-Selling Products**: We can also identify the top-selling products by calculating the total quantity sold for each product.
    

In [5]:

# Grouping by Product and calculating total quantity sold
best_selling_products = sales_df.groupby('Product')['Quantity'].sum().sort_values(ascending=False)
print("Best-selling products by quantity:")
print(best_selling_products)

Best-selling products by quantity:
Product
Mouse       29
Laptop      17
Keyboard    16
Monitor     15
Name: Quantity, dtype: int64



### Step 3: Saving the Analysis Results to a CSV File

Once we've performed the analysis, we can export the results to a new CSV file for reporting or sharing.
    

In [6]:

# Saving the sales by region data to a CSV file
sales_by_region.to_csv('sales_by_region.csv')

# Saving the best-selling products data to a CSV file
best_selling_products.to_csv('best_selling_products.csv')

print("Analysis results saved to CSV files.")
    

Analysis results saved to CSV files.



## Tutorial: Importing Stock Market Data for Analysis

In this real-life scenario, we'll use the Yahoo Finance API to download historical stock market data for Google (GOOGL) and analyze it with Pandas. This use case demonstrates the powerful combination of fetching, manipulating, and exporting data.

### Step 1: Installing the yfinance Library
    

In [7]:
!pip install yfinance




### Step 2: Downloading Stock Market Data
    

In [8]:

import yfinance as yf
from datetime import datetime

# Downloading historical stock data for Google (GOOGL)
ticker = 'GOOGL'
stock_data = yf.download(ticker, start='2020-01-01', end=datetime.today())

# Displaying the first few rows of the DataFrame
print("First few rows of the Google stock market data:")
print(stock_data.head())
    

[*********************100%%**********************]  1 of 1 completed

First few rows of the Google stock market data:
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2020-01-02  67.420502  68.433998  67.324501  68.433998  68.264961  27278000
2020-01-03  67.400002  68.687500  67.365997  68.075996  67.907852  23408000
2020-01-06  67.581497  69.916000  67.550003  69.890503  69.717865  46768000
2020-01-07  70.023003  70.175003  69.578003  69.755501  69.583206  34330000
2020-01-08  69.740997  70.592499  69.631500  70.251999  70.078476  35314000






### Step 3: Basic Data Exploration
    

In [9]:

# Summary of the DataFrame
print("Summary of the Google stock market data:")
print(stock_data.info())

# Checking for missing values
print("Missing values in the Google stock market data:")
print(stock_data.isnull().sum())
    

Summary of the Google stock market data:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1181 entries, 2020-01-02 to 2024-09-11
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       1181 non-null   float64
 1   High       1181 non-null   float64
 2   Low        1181 non-null   float64
 3   Close      1181 non-null   float64
 4   Adj Close  1181 non-null   float64
 5   Volume     1181 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 64.6 KB
None
Missing values in the Google stock market data:
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64



### Step 4: Saving the Processed Data to a New CSV File
    

In [10]:

# Writing the DataFrame to a CSV file
stock_data.to_csv('googl_stock_data.csv', index=True)

print("Google stock data saved to googl_stock_data.csv")

Google stock data saved to googl_stock_data.csv
