# <p style="background-color: #f5df18; padding: 10px;">Programming & Plotting in Python | **Reading Tabular Data into DataFrames** </p>



### <strong>Instructor: <span style="color: darkblue;">Name (Affliation)</span></strong>

Estimated completion time: 🕚 20 minutes


<div style="display: flex;">
    <div style="flex: 1; margin-right: 20px;">
        <h2>Questions</h2>
        <ul>
            <li>How can I read tabular data?</li>
        </ul>
    </div>
    <div style="flex: 1;">
        <h2>Learning Objectives</h2>
        <ul>
            <li>Import the Pandas library.</li>
    <li>Use Pandas to load a simple CSV data set.</li>
    <li>Get some basic information about a Pandas DataFrame.</li>
        </ul>
    </div>
</div>


## Use the Pandas library to do statistics on tabular data.
---

- [Pandas](https://pandas.pydata.org/) is a widely-used Python library for statistics, particularly on tabular data.
- Borrows many features from R's dataframes.
  - A 2-dimensional table whose columns have names
    and potentially have different data types.
- Load Pandas with `import pandas as pd`. The alias `pd` is commonly used to refer to the Pandas library in code.
- Read a Comma Separated Values (CSV) data file with `pd.read_csv`.
  - Argument is the name of the file to be read.
  - Returns a dataframe that you can assign to a variable


In [None]:
import pandas as pd

data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')
print(data_oceania)

- The columns in a dataframe are the observed variables, and the rows are the observations.
- Pandas uses backslash `\` to show wrapped lines when output is too wide to fit the screen.
- Using descriptive dataframe names helps us distinguish between multiple dataframes so we won't accidentally overwrite a dataframe or read from the wrong one.

## 🔔 File Not Found
---

Our lessons store their data files in a `data` sub-directory,
which is why the path to the file is `data/gapminder_gdp_oceania.csv`.
If you forget to include `data/`,
or if you include it but your copy of the file is somewhere else,
you will get a [runtime error](04-built-in.md)
that ends with a line like this:

```error
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
```


##  Use `index_col` to specify that a column's values should be used as row headings.
---

- Row headings are numbers (0 and 1 in this case).
- Really want to index by country.
- Pass the name of the column to `read_csv` as its `index_col` parameter to do this.
- Naming the dataframe `data_oceania_country` tells us which region the data includes (`oceania`) and how it is indexed (`country`).

## Use the `DataFrame.info()` method to find out more about a dataframe.
---


- This is a `DataFrame`
- Two rows named `'Australia'` and `'New Zealand'`
- Twelve columns, each of which has two actual 64-bit floating point values.
  - We will talk later about null values, which are used to represent missing observations.
- Uses 208 bytes of memory.

## The `DataFrame.columns` variable stores information about the dataframe's columns.
---

- Note that this is data, *not* a method.  (It doesn't have parentheses.)
  - Like `math.pi`.
  - So do not use `()` to try to call it.
- Called a *member variable*, or just *member*.

## Use `DataFrame.T` to transpose a dataframe.
---

- Sometimes want to treat columns as rows and vice versa.
- Transpose (written `.T`) doesn't copy the data, just changes the program's view of it.
- Like `columns`, it is a member variable.


## Use `DataFrame.describe()` to get summary statistics about data.
---

`DataFrame.describe()` gets the summary statistics of only the columns that have numerical data.
All other columns are ignored, unless you use the argument `include='all'`.

- Not particularly useful with just two records,
  but very helpful when there are thousands.

## <p style="background-color: #f5df18; padding: 10px;"> 🛑 Reading other data </p>

---

Read the data in `gapminder_gdp_americas.csv`
(which should be in the same directory as `gapminder_gdp_oceania.csv`)
into a variable called `data_americas`
and display its summary statistics.

In [None]:
### your answer here ####

## <p style="background-color: #f5df18; padding: 10px;"> 🛑 Inspecting data </p>

---

After reading the data for the Americas,
use `help(data_americas.head)` and `help(data_americas.tail)`
to find out what `DataFrame.head` and `DataFrame.tail` do.

1. What method call will display the first three rows of this data?
2. What method call will display the last three columns of this data?
  (Hint: you may need to change your view of the data.)

In [None]:
### your answer here ####

## <p style="background-color: #f5df18; padding: 10px;"> 🛑 Reading Files in Other Directories </p>
---

The data for your current project is stored in a file called `microbes.csv`,
which is located in a folder called `field_data`.
You are doing analysis in a notebook called `analysis.ipynb`
in a sibling folder called `thesis`:

```output
your_home_directory
+-- field_data/
|   +-- microbes.csv
+-- thesis/
    +-- analysis.ipynb
```

What value(s) should you pass to `read_csv` to read `microbes.csv` in `analysis.ipynb`?

In [None]:
### your answer here ####

## <p style="background-color: #f5df18; padding: 10px;"> 🛑 Writing data </p>
---

As well as the `read_csv` function for reading data from a file,
Pandas provides a `to_csv` function to write dataframes to files.
Applying what you've learned about reading from files,
write one of your dataframes to a file called `processed.csv`.
You can use `help` to get information on how to use `to_csv`.

In [None]:
### your answer here ####

# <p style="background-color: #f5df18; padding: 10px;"> 🗝️ Key points</p>
---

- Use the Pandas library to get basic statistics out of tabular data.
- Use `index_col` to specify that a column's values should be used as row headings.
- Use `DataFrame.info` to find out more about a dataframe.
- The `DataFrame.columns` variable stores information about the dataframe's columns.
- Use `DataFrame.T` to transpose a dataframe.
- Use `DataFrame.describe` to get summary statistics about data.