# E 01 Read some data and look at it 

My approach to teach you python is by letting you *do* things. In this example you are going to be able to do a lot of things, quite fast. That doesn't mean that you'll *understand* what you did, but with this very first crash course I hope to give you a bit of the taste of how Python works.

## Get the data

The data files for our exercises are available on OLAT. 

Copy the file data_Zhadang.csv from the data folder on the course OLAT page to the same directory as your copy of this notebook.

**Q: Open the file with a text editor. What kind of file is it? What does "csv" stands for?**

## Read the data

To read and analyse the data, we are going to use a very powerful package called [pandas](http://pandas.pydata.org/). Pandas is one of the reason why scientists are moving to python. It is really cool, as I hope to be able to show you now. 

In [None]:
import pandas as pd  # pd is the short name for pandas. It's good to stick to it. 
# While we are at it, let's import some other things we might need later
%matplotlib inline
import matplotlib.pyplot as plt  # plotting library
import numpy as np  # numerical library

In [None]:
# We are now using pandas to read the data out of the csv file
# The first argument to the function is a path to a file, the other arguments
# are called "keywords". They tell to pandas to do certain things
df = pd.read_csv('data/data_Zhadang.csv', index_col=0, parse_dates=True)

df is a new variable we just created. It the short name for "dataframe". A dataframe is a kind of table, a little bit like in excel. Let's simply print it:

In [None]:
df

The dataframe has one "column" (TEMP_2M) and an "index" (a timestamp). A dataframe has many useful functions, for example you can make a plot out of it:

In [None]:
df.plot();

## Select parts of the data

Pandas is really good at *indexing* data. This means that it should be as easy as possible to select, for example, a specific day and plot it on a graph:

In [None]:
df_sel = df.loc['2011-05-12'] 
df_sel.plot();

**Q: Now select another day (for example July 1st 2011), and plot the data.**

In [None]:
# your answer here

Note that you can also select specific range of time, for example the first week of July:

In [None]:
df.loc['2011-07-01':'2011-07-07'].plot();

Or the month of August:

In [None]:
df.loc['2011-08'].plot();

## Computing averages

Pandas comes with very handy operations like "[resample](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)". Resample helps you to compute statistics over time. It's better explained with an example:

In [None]:
daily_mean = df.resample('D').mean()

**Q: Print the daily_mean variable. Plot it.**

In [None]:
# your answer here

**Q: Now try the functions df.resample('D').max() and df.resample('D').min(). What will they do? Plot them.**

In [None]:
# your answer here

## Adding and selecting columns to dataframes

Columns in the dataframe can be created with the simple syntax:

In [None]:
daily_mean['TEMP_MAX'] = df.resample('D').max()
daily_mean['TEMP_MIN'] = df.resample('D').min()

**Q: Print the daily_mean dataframe. How many columns does it have? Plot it.**

In [None]:
# your answer here

It is easy to select a single column and, for example, plot it alone:

In [None]:
daily_mean['TEMP_MAX'].plot();

## Operations on columns

Operations on columns are just like normal array operations:

In [None]:
temp_range = daily_mean['TEMP_MAX'] - daily_mean['TEMP_MIN']

**Q: What is temp_range? Plot it. Add it to the daily_mean dataframe**

In [None]:
# your answer here

## Exercise: apply what you just learned

In the example above, we used resample with the argument 'D', for "daily frequency". The equivalent for monthly would be 'MS' (the "S" is for "start"). Could you repeat the operation above, but with monthly averages instead of daily averages?

In [None]:
# your answer here

In [None]:
# and here if you want

How could you plot two daily temperature cycles in the same figure? Have a go!

In [None]:
# your answer here

To read more about pandas indexing and data selection conventions look here: https://pandas.pydata.org/pandas-docs/stable/indexing.html