# 3.1 Getting Data from Files into a *pandas* Dataframe    


- [Reading a Local CSV File](#Reading-a-Local-CSV-file)  
- [Reading a CSV File on the Web](#Reading-a-CSV-File-on-the-Web)  


- [Reading an Excel File: Single Worksheet](#Reading-Excel-File:-Single-Worksheet) 
- [Reading an Excel File: Multiple Worksheets](#Reading-an-Excel-File:-Multiple-Worksheets)  
- [Skip a number of the rows at the top of the worksheet](#Skip-a-number-of-the-rows-at-the-top-of-the-worksheet)  


- **Data Files Required**  
  - olympics.csv  
  - Data_Household.xlsx  
  - w3schools_Data_Updated.xlsx

# Note: Install Required!  
We need the **xlrd** library installed to read Excel files.

To install:

1. Go to the VS Code Terminal

2. Type:  **pip install xlrd**

In [5]:
import pandas as pd

# Reading a Local CSV file

In [6]:
#Read the csv file into a pandas dataframe
#Note: This file is in the Data folder underneath this folder 
df = pd.read_csv('Data//olympics.csv')

#Display the first five records/rows in the dataframe
#df.head()

# Reading a CSV File on the Web
Data source:  https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv

In [7]:
#1.Read the csv file into a pandas dataframe
df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')

#2.Display the first five records/rows in the dataframe
#df.head()

# Reading Excel File: Single Worksheet  

In [8]:
# Set path to Excel file
file = 'Data//Data_Household.xlsx'

In [9]:
# Read the data from a worksheet into its own dataframe
df_HouseHold = pd.read_excel(file)
df_HouseHold.head()

Unnamed: 0,Month,Category,Amount
0,January,Transportation,74
1,January,Grocery,235
2,January,Household,175
3,January,Entertainment,100
4,February,Transportation,115


In [10]:
# Look at the data types for the household data
df_HouseHold.dtypes

Month       object
Category    object
Amount       int64
dtype: object

# Reading an Excel File: Multiple Worksheets

In [11]:
# Set path to Excel file
file = 'Data//w3schools_Data_Updated.xlsx'

### Get the worksheet names in the Excel file

In [12]:
# Read the Excel file
df = pd.ExcelFile(file)

# Display the WorkSheet names
df.sheet_names

['Customers',
 'Categories',
 'Employees',
 'OrderDetails',
 'Orders',
 'Products',
 'Shippers',
 'Suppliers']

### Read a worksheet into a pandas dataframe

In [13]:
# Read the data from a worksheet into its own dataframe
df_W3_categories = pd.read_excel(file, 'Categories')
df_W3_categories.head()

Unnamed: 0,CategoryID,CategoryName,Description
0,1,Beverages,"Soft drinks, coffees, teas, beers, and ales"
1,2,Condiments,"Sweet and savory sauces, relishes, spreads, an..."
2,3,Confections,"Desserts, candies, and sweet breads"
3,4,Dairy Products,Cheeses
4,5,Grains/Cereals,"Breads, crackers, pasta, and cereal"


# Skip a number of the rows at the top of the worksheet

In [14]:
# Three junk rows above the header in the spreadsheet:
# Skip them!
df_W3_categories = pd.read_excel(file, 'Customers')
df_W3_categories.head()

Unnamed: 0,Customers Table,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,,,,,,,
1,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
2,1,Alfreds Futterkiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany
3,2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la Constitución 2222,México D.F.,5021,Mexico
4,3,Antonio Moreno Taquería,Antonio Moreno,Mataderos 2312,México D.F.,5023,Mexico
