# Pandas 1 - Introduction - Importing Data from Local Drive

Pandas is a Python package that is used for data processing and analysis.  The key object in Pandas is the **data frame**, which holds information in a form that can be easily filtered, sorted, transformed, plotted, analyzed, etc. from Python.  This notebook will cover some of the basic commands in Pandas.

One of the nice features of Pandas is that it can import files of different formats and turn them into a data frame.  These file types include Microsoft Excel, CSV (comma separated values), and JSON (JavaScript Object Notation), and several others.

In this worksheet, we will use a CSV file containing data that can be used to plot the functions $\sin x $, $\cos x$, and $\exp(x)$.  Download the file "functions.csv" by going to the [Myplace site for CP540](https://classes.myplace.strath.ac.uk/course/view.php?id=27428#section-5) or download the file directly by clicking [here](https://classes.myplace.strath.ac.uk/mod/resource/view.php?id=1758214).

In the remainder of this notebook, we look at a couple different ways to import data files into Google Colab to create a Pandas data frame.  Then, we introduce some of the basic commands that can be used to explore and modify the data frame.  Finally, we give a couple different ways to save data frames.


## Importing files into Google Colab

### Reading a local file

The first way that you can try to import a data file stored on your computer into Google Colab is to read it directly from your local drive.  To do this, first run the following commands in the code block below:

In [None]:
from google.colab import files


uploaded = files.upload()

Now, while this is running, left-click the `Browse` button, which will bring up a file selection menu.  Navigate to where the file you want (which in this case is called "functions.csv") is located, and then left-click it.  Once you have done that, you will then see in the file appear in the menu in the far left column of the notebook.

As we have previous done with packages such as `numpy`, we must first import pandas.

In [None]:
import pandas as pd
import io

df = pd.read_csv(io.BytesIO(uploaded['functions.csv']), skipinitialspace=True)   #This command assigns the data from our imported csv file to a dataframe within our code
print(df)

### Reading a file from Google Drive

Another manner to import a data file is to first upload it to Google
Drive and then access it from there.  This has the added benefit that
you can work with the data from any computer, as long as you have access to your Google account.

First upload the file "functions.csv", which you should have already downloaded from Myplace, to your Google Drive.  You can put the file in any subfolder you wish.  

Now, mount the contents of your Google Drive to Colab by running the following code block:


In [None]:
from google.colab import drive

drive.mount('/content/drive')

Once you have done that, you can now access any file on your Google
Drive from your Jupyter notebook.  

Click on the folder icon in the far left column of the notebook.  This should reveal a set of folders.  Navigate to the file you want on Google Drive by clicking on the folder named "drive" then "MyDrive" and then to the folder where you had uploaded the file "functions.csv".


In [None]:
import pandas as pd

filename = '/content/drive/MyDrive/data/functions.csv'
df = pd.read_csv(filename)


## Basic commands

If the previous code blocks have run successfully, we should now have loaded all the data in the file "functions.csv" into a Pandas data frame that is named `df`.  To get an overall look at the data frame, we can use the function `describe()`, which gives a table summarizing some of its key properties:

In [None]:
df.describe()

The first row of the table gives the names of each of the columns of the data frame.  The second row gives the number of elements in each of the columns.  The third row provides the mean value of all the elements in a column, while the fourth row gives the standard deviation.  Other rows give the minimum and maximum values, in each column, as well as the averages within the 25th, 50th, and 75th percentiles.


Other commands/properties include:

In [None]:
df.head()  # to see the first 5 rows; to see the first n rows, just put the integer n as an argument to the member function head()

In [None]:
df.tail() # to see the final 5 rows; to see the final n rows, just put the integer n as an argument to the member function tail()

In [None]:
df.columns  # returns a list of the name of each column in the data frame

In [None]:
df.shape   # "shape" of the data frames, which returns a tuple of the number of rows and columns
           # to get the number of rows df.shape[0]
           # to get the number of columns df.shape[1]

In [None]:
df['x'].mean()  # mean value of the column "x"

In [None]:
df['x'].std()  # standard deviation of the column "x"

Each column can be accessed from the data frame like a dictionary.  We can plot the data within a data frame using the [`Matplotlib`](https://matplotlib.org/) library, as seen in previous worksheets:

In [None]:
import matplotlib.pyplot as plt

plt.plot(df['x'], df['sin x'], label=r'$\sin x$')
plt.plot(df['x'], df['cos x'], label=r'$\cos x$')
plt.plot(df['x'], df['exp(-x)'], label='$e^{-x}$')
plt.xlabel(r'$x$')
plt.ylabel(r'function')
plt.legend()
plt.show()

We can also directly plot any columns of the data frame against any other column, using the member function `plot`:

In [None]:
df.plot(x='x', y=['sin x', 'cos x', 'exp(-x)'])



We can also plot a histogram of the data in a particular column:

In [None]:
df.hist(column='sin x')

## Modifying data frames



Pandas allows us to not only analyze data frames but also to modify them.  

We see that the imported data has five columns: the first gives an index, the next gives the values of $x$, while the remaining columns give the values of the various functions at the given value of $x$.

The first column is not very useful, so we can simply eliminate it by using the `drop` member function:

In [None]:
df_new = df.drop(columns=['Unnamed: 0'])

print(df_new)

This does not actually change the data frame `df`, but it returns a data frame, which we put in the variable `df_new`, that does not have the index column.

Similarly, we can also eliminate a range of rows by using the `drop` member function:

In [None]:
df_new = df.drop(df.index[10:30])  # eliminate rows 10 to 29, where the count starts at 0
print(df_new)



We can add a new column, just by creating a new list and assigning it to the data frame.  For example, let's add a new column called "quadratic" given by the square of the value in the column "x".

In [None]:
df['quadratic'] = [x*x for x in df['x']]
print(df)

We can also filter data frames to focus on the particular data that we are interested in.  Some examples include:

In [None]:
df_filter = df[df['x'] < 0.5]
print('case 1')
print(df_filter)

df_filter = df[(df['x'] < 1.0) & (df['x'] > 0.5)]
print()
print('case 2')
print(df_filter)

df_filter = df[(df['x'] < 1.0) | (df['x'] > 2.0)]
print()
print('case 3')
print(df_filter)

## Exporting data frames


In a similar manner that we imported data from Google Drive, we can also export a data frame to a csv file on Google Drive:

In [None]:
from google.colab import files

df.to_csv('junk.csv')
files.download('junk.csv')

In [None]:
from google.colab import drive

drive.mount('/content/drive')

If Google Drive has already been mounted, then there is no need to re-run this command.  To export the data frame to a CSV file named "junk.csv" in the directory "./data".

In [None]:
import pandas as pd

filename = '/content/drive/MyDrive/data/junk.csv'
df.to_csv(filename)

## Conclusion

In this notebook, we introduced Pandas.  We showed how to import data into a Pandas dataframes, and looked some basic commands.  For more information on Pandas, there are several useful introductory tutorials on the "intro to pandas section of the [Getting started](https://pandas.pydata.org/docs/getting_started/index.html) page.  For more in-depth material, you can refer to the [Users Guide](https://pandas.pydata.org/docs/user_guide/index.html).  In