# Title
### Name

Give an introduction here. Provide context to the research you will or have conducted: why is it important, what insights did you find, what's the story behind your data?

## Getting Started

Let's start by importing any libraries we may need. Feel free to add any additional ones or python files you want to use.

In [None]:
# import any relevant libraries or python files
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt

Load your data here. Don't forget to give the source and describe any relevant details about how data was collected!

In [None]:
# create a dataframe (df) that contains all the data in <filename>

# if <filename> has a .csv file extension, use the following
df = pd.read_csv("../data/filename.csv")

# if <filename> has a .xls or .xlsx file extension, use this
df = pd.read_excel("../data/filename.xlsx")

Note that in the above functions, <tt>../data/</tt> means that the data is in the data directory. If your data is not in this directory, then move it there. If you have changed the name of the data directory, say to <tt>Data</tt>, then make sure to also change this path to <tt>../Data/</tt>. Lastly, check that the name of the file your data is in exactly matches the with which text you replace <tt>filename...</tt>.

Delete this text box when you're done by clicking on the text, then clicking the d key twice. If this box is outlined in green, it means it's editable - simple click within the green frame, but to the left of the text box to change the edit mode. For more tips and tricks using Jupyter notebooks, click the <tt>Help</tt> drop down menu. 

Let's take a look at the data!

In [None]:
# the head function in python prints the first number of 
# elements specified by the number in the parentheses
df.head(10)

Check if in the dataframe generated from the above, the column names contain the data from the first row, ie. a value in place of <tt>ID</tt> or <tt>STATUS</tt>. If the columns names are in fact data, then you have two options: add a header row manually (insert a row at the top of your data file and name each column) or through python!

The code cell below shows how to do this using python. If your file was not in csv, then make sure to change <tt>read_csv</tt> to <tt>read_excel</tt> as you have it above! Then, simply delete the <tt>#</tt> and space preceding each line (except the first two lines), then click <tt>Kernel -> Restart &amp; Run All</tt>.

In [None]:
# reload the data and name the columns, change col1 etc. to
# names appropriate to your data. Make sure you name each column!
df = pd.read_csv("../data/filename.csv", header=None)
df.columns = ["col1", "col2", "col3"]

Now, let's make sure that the data is the correct type. Here's a brief rundown on types:
<ul>
    <li>Is it an integer? Its type is: <tt>int</tt></li>
    <li>Is it a number with a decimal? Its type is: <tt>float</tt></li>
    <li>Is it a date? Its type is: <tt>datetime64</tt></li>
    <li>Is it a letter, word, or a sequence of words? Its type is: <tt>string</tt>, which is an <tt>object</tt> in a dataframe (no type conversion required)</li>
</ul>

The following code cell shows what pandas detects a column type to be. If all columns are correctly represented, then you're ready to begin analysis! Otherwise, follow the next cell to convert a column to its corresponding type.

In [None]:
# convert a column (called col_name) to a datetime object (in day 
# month year format), dates that do not match the expected format
# are coerced into NaT (not a time)
df[col_name] = pd.to_datetime(df[col_name], format='%d%m%y', errors='coerce')

# convert a column (called col_name) to a number, forcing non-numbers
# to NaN (not a number)
df[col_name] = pd.to_numberic(df[col_name], errors='coerce')

If your dataset contains "Yes" and "No" values (or any binary options) and you would like to convert the column to boolean or int type, change the following excerpt of code as needed.

In [None]:
# convert "Yes" and "No" to True and False
df[col_name] = df[col_name].map({'Yes': True, 'No': False})

# convert "Male", "Female" and "Prefer not to say" to 0, 1 and 2
gender_dictionary = {'Male': 0, 'Female': 1, 'Prefer not to say': 2}
df[col_name] = df[col_name].map(genderDictionary)

## Analysis and More Demos

There are more example notebooks in the <tt>Getting Started</tt> and <tt>Demos</tt> folders including:
<ul>
    <li><a href="https://github.com/kinges17/Jupyter-Project/tree/master/Getting-Started/Jupyter-and-Python-Tutorials">How to use Python</a></li>
    <li><a href="https://github.com/kinges17/Jupyter-Project/tree/master/Getting-Started/R-in-Notebooks">How to use R</a></li>
    <li><a href="https://github.com/kinges17/Jupyter-Project/tree/master/Getting-Started">Binder and Azure Notebookes</a></li>
    <li><a href="https://github.com/kinges17/Jupyter-Project/tree/master/Demos">Data science example projects</a></li>
    <li><a href="https://github.com/kinges17/Jupyter-Project/tree/master/Demos">Descriptive statistics in a notebook</a></li>
    <li>... and more</li>
</ul>

Hope this is enough to get you started!