# Reading multiple data files

## Tools for pandas data import
- pd.read_csv() for CSV files
    - dataframe = pd.read_csv(filepath)
    - dozens of optional input parameters
- Other data import tools:
    - pd.read_excel()
    - pd.read_html()
    - pd.read_json()

### Loading separate files


In [1]:
import pandas as pd

dataframe0 = pd.read_csv('Sales/sales-jan-2015.csv')

dataframe0 = pd.read_csv('Sales/sales-feb-2015.csv')

### Using a loop


In [2]:
filenames = ['Sales/sales-jan-2015.csv', 'Sales/sales-feb-2015.csv']

dataframes = []
for f in filenames:
    dataframes.append(pd.read_csv(f))

### Using a comprehension


In [5]:
filenames = ['Sales/sales-jan-2015.csv', 'Sales/sales-feb-2015.csv']

dataframes = [pd.read_csv(f) for f in filenames]


### Using glob


In [6]:
from glob import glob

filename = glob('Sales/sales*.csv')

dataframes = [pd.read_csv(f) for f in filenames]

# Let’s practice!

list

---
# Reindexing DataFrames

### “Indexes” vs. “Indices”
- indices: many index labels within Index data structures
- indexes: many pandas Index data structures

### Importing weather data

In [None]:
import pandas as pd
w_mean = pd.read_csv()

### Examining the data


### The DataFrame indexes

### Using` .reindex()`


### Using .sort_index()


### Reindex from a DataFrame Index


### Reindexing with missing labels


### Reindex from a DataFrame Index


### Order ma!ers


---
# Let’s practice!

# Arithmetic with Series & DataFrames

### Loading weather data

In [17]:
weather = pd.read_csv('pittsburgh2013.csv',
                     index_col='Date',
                     parse_dates=True)

weather.loc['2013-7-1':'2013-7-7', 'PrecipitationIn']

Date
2013-07-01    0.18
2013-07-02    0.14
2013-07-03    0.00
2013-07-04    0.25
2013-07-05    0.02
2013-07-06    0.06
2013-07-07    0.10
Name: PrecipitationIn, dtype: float64

### Scalar multiplication

In [18]:
weather.loc['2013-7-1':'2013-7-7', 'PrecipitationIn']* 2.54

Date
2013-07-01    0.4572
2013-07-02    0.3556
2013-07-03    0.0000
2013-07-04    0.6350
2013-07-05    0.0508
2013-07-06    0.1524
2013-07-07    0.2540
Name: PrecipitationIn, dtype: float64

### Absolute temperature range

In [19]:
week1_range = weather.loc['2013-07-01':'2013-07-07',['Min TemperatureF',
                                                     'Max TemperatureF']] 

week1_range

Unnamed: 0_level_0,Min TemperatureF,Max TemperatureF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-07-01,66,79
2013-07-02,66,84
2013-07-03,71,86
2013-07-04,70,86
2013-07-05,69,86
2013-07-06,70,89
2013-07-07,70,77


### Average temperature

In [20]:
week1_mean = weather.loc['2013-07-01':'2013-07-07',
                                 'Mean TemperatureF']

week1_mean

Date
2013-07-01    72
2013-07-02    74
2013-07-03    78
2013-07-04    77
2013-07-05    76
2013-07-06    78
2013-07-07    72
Name: Mean TemperatureF, dtype: int64

### Relative temperature range

In [23]:
week1_range / week1_mean

  return this.join(other, how=how, return_indexers=return_indexers)


Unnamed: 0_level_0,2013-07-01 00:00:00,2013-07-02 00:00:00,2013-07-03 00:00:00,2013-07-04 00:00:00,2013-07-05 00:00:00,2013-07-06 00:00:00,2013-07-07 00:00:00,Min TemperatureF,Max TemperatureF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2013-07-01,,,,,,,,,
2013-07-02,,,,,,,,,
2013-07-03,,,,,,,,,
2013-07-04,,,,,,,,,
2013-07-05,,,,,,,,,
2013-07-06,,,,,,,,,
2013-07-07,,,,,,,,,


### Relative temperature range

In [24]:
week1_range.divide(week1_mean,axis='rows')

Unnamed: 0_level_0,Min TemperatureF,Max TemperatureF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-07-01,0.916667,1.097222
2013-07-02,0.891892,1.135135
2013-07-03,0.910256,1.102564
2013-07-04,0.909091,1.116883
2013-07-05,0.907895,1.131579
2013-07-06,0.897436,1.141026
2013-07-07,0.972222,1.069444


### Percentage changes

In [25]:
week1_mean.pct_change() * 100

Date
2013-07-01         NaN
2013-07-02    2.777778
2013-07-03    5.405405
2013-07-04   -1.282051
2013-07-05   -1.298701
2013-07-06    2.631579
2013-07-07   -7.692308
Name: Mean TemperatureF, dtype: float64

### Bronze Olympic medals

In [26]:
bronze = pd.read_csv('Summer Olympic medals/bronze_top5.csv', index_col=0)

bronze

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
United States,1052.0
Soviet Union,584.0
United Kingdom,505.0
France,475.0
Germany,454.0


### Silver Olympic medals

In [27]:
silver = pd.read_csv('Summer Olympic medals/silver_top5.csv', index_col=0)

silver

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
United States,1195.0
Soviet Union,627.0
United Kingdom,591.0
France,461.0
Italy,394.0


### Gold Olympic medals

In [28]:
gold = pd.read_csv('Summer Olympic medals/gold_top5.csv', index_col=0)

gold

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
United States,2088.0
Soviet Union,838.0
United Kingdom,498.0
Italy,460.0
Germany,407.0


### Adding bronze, silver

In [29]:
bronze + silver

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,936.0
Germany,
Italy,
Soviet Union,1211.0
United Kingdom,1096.0
United States,2247.0


In [36]:
bronze.loc['United States']

Total    1052.0
Name: United States, dtype: float64

In [40]:
silver.loc['United States']

Total    1195.0
Name: United States, dtype: float64

### Using the .add() method

In [41]:
bronze.add(silver)

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,936.0
Germany,
Italy,
Soviet Union,1211.0
United Kingdom,1096.0
United States,2247.0


### Using a fill_value

In [42]:
bronze.add(silver, fill_value=0)

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,936.0
Germany,454.0
Italy,394.0
Soviet Union,1211.0
United Kingdom,1096.0
United States,2247.0


### Adding bronze, silver, gold

In [43]:
bronze + silver + gold

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,
Germany,
Italy,
Soviet Union,2049.0
United Kingdom,1594.0
United States,4335.0


### Chaining .add()

In [44]:
bronze.add(silver, fill_value=0).add(gold, fill_value=0)

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,936.0
Germany,861.0
Italy,854.0
Soviet Union,2049.0
United Kingdom,1594.0
United States,4335.0


---
# Let’s practice!