# Exercise 6: Reading Tabular Data into DataFrames

## Aim: Learn what DataFrames are and practice using them.

### Issues covered:
- Importing the `pandas` library
- Using `pandas` to load a simple CSV data set
- Get information about the DataFrames we make

## 1. Let's import `pandas` and make some DataFrames.

Import `pandas`, then create a dataframe using the `data/weather.csv` file and print it out.

In [1]:
import pandas as pd
weather_data = pd.read_csv('../data/weather.csv')
print(weather_data)

         Date   Time  Temp  Rainfall
0  2014-01-01  00:00  2.34      4.45
1  2014-01-01  12:00  6.70      8.34
2  2014-01-02  00:00 -1.34     10.25


Create a new dataframe which indexes by `Date` and print it.

In [2]:
weather_data_dates = pd.read_csv('../data/weather.csv', index_col='Date')
print(weather_data_dates)

             Time  Temp  Rainfall
Date                             
2014-01-01  00:00  2.34      4.45
2014-01-01  12:00  6.70      8.34
2014-01-02  00:00 -1.34     10.25


## 2. Let's practice using some dataframe methods.

What is the memory usage of the dataframe in bytes?

In [3]:
weather_data_dates.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 2014-01-01 to 2014-01-02
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Time      3 non-null      object 
 1   Temp      3 non-null      float64
 2   Rainfall  3 non-null      float64
dtypes: float64(2), object(1)
memory usage: 96.0+ bytes


What command can you use to find the dataframe's column names?

In [4]:
print(weather_data_dates.columns)

Index(['Time', 'Temp', 'Rainfall'], dtype='object')


Swap the rows and columns and `print` the result.

In [5]:
print(weather_data_dates.T)

Date     2014-01-01 2014-01-01 2014-01-02
Time          00:00      12:00      00:00
Temp           2.34        6.7      -1.34
Rainfall       4.45       8.34      10.25


Find the mean and standard deviation of the weather data.

In [6]:
print(weather_data_dates.describe())

           Temp   Rainfall
count  3.000000   3.000000
mean   2.566667   7.680000
std    4.024790   2.955791
min   -1.340000   4.450000
25%    0.500000   6.395000
50%    2.340000   8.340000
75%    4.520000   9.295000
max    6.700000  10.250000


## 3. Extension: Some Dataframe Challenges

1. Find the first three rows of data in `data/americas_gdp.csv` using `head()`.

In [7]:
data_americas = pd.read_csv('../data/americas_gdp.csv')
data_americas.head(3)

Unnamed: 0,continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
0,Americas,Argentina,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964
1,Americas,Bolivia,2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084
2,Americas,Brazil,2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825


2. Find the last 3 **columns** of data.

_Hint: You may need to change your view of the data then you can use `tail()`._

In [8]:
data_americas.T.tail(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
gdpPercap_1997,10967.28195,3326.143191,7957.980824,28954.92589,10118.05318,6117.361746,6677.045314,5431.990415,3614.101285,7429.455877,...,9767.29753,2253.023004,7113.692252,4247.400261,5838.347657,16999.4333,8792.573126,35767.43303,9230.240708,10165.49518
gdpPercap_2002,8797.640716,3413.26269,8131.212843,33328.96507,10778.78385,5755.259962,7723.447195,6340.646683,4563.808154,5773.044512,...,10742.44053,2474.548819,7356.031934,3783.674243,5909.020073,18855.60618,11460.60023,39097.09955,7727.002004,8605.047831
gdpPercap_2007,12779.37964,3822.137084,9065.800825,36319.23501,13171.63885,7006.580419,9645.06142,8948.102923,6025.374752,6873.262326,...,11977.57496,2749.320965,9809.185636,4172.838464,7408.905561,19328.70901,18008.50924,42951.65309,10611.46299,11415.80569


3. Use `help(data_americas.to_csv)` to figure out how writing to a CSV file works.

In [9]:
help(data_americas.to_csv)

Help on method to_csv in module pandas.core.generic:

to_csv(path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None' method of pandas.core.frame.DataFrame instance
    Write object to a comma-separated values (csv) file.
    
    Parameters
    ----------
    path_or_buf : str, path object, file-like object, or None, default None
   

4. Try writing to a CSV file using the code below (giving your own filename). Take a look in the data folder and check it's there.
```
data_americas.to_csv('data/new_file_name.csv')
```

In [10]:
data_americas.to_csv('../data/processed.csv')