## Pandas

In [None]:
import pandas as pd

Before running the next command, make sure that you have the csv file (saman1.csv) in the same folder as your notebook.

In [None]:
df = pd.read_csv("saman1.csv")  

In [None]:
df

In [None]:
df.head()

In [None]:
df.head(2)

In [None]:
df["Age"]

In [None]:
df.iloc[:, 3]

In [None]:
df.iloc[3, :]

In [None]:
df.iloc[3, 2]

In [None]:
df.iloc[3, 1:4]

In [None]:
df.iloc[2, [1, 2]]

In [None]:
df.iloc[2, 2] = "Terner"

In [None]:
df.to_csv("updated_data.csv")

Saving into a different directory, make sure first that the path exists on your computer - otherwise you get an error..

In [None]:
df.to_csv("some_path/updated_data.csv")

## Part II - a walk-through tutorial

### Basic DataFrame characteristics:

### shape 
To get the DataFrame's size, we can use the command **shape**. Type down the DataFrame's name ('df' in this case), then '.', and then the command - 'shape'. To execute, click SHIFT-ENTER.

In [None]:
df.shape

**TIP:** To see possible completions, try clicking **TAB** after you type 'df.': 

![tab.jpg](attachment:tab.jpg)

### dtypes 
To see the types of the variables/columns in the dataframe, type **df.dtypes**:

In [None]:
df.dtypes

**Note:** The dype object indicates **strings** - here shown for the columsn First_name and Last_name.

### columns 
If what we want is simmply a list with the names of the columns, we can write df.columns, and then
click SHIFT-ENTER.

In [None]:
df.columns

## Conditions

We've learnt how to print out the entire table or the first or last rows. But what if I want to see just the records (rows) of individuals who are over 40 years old? Or, individuals who have lost at least 10kg? 
In many programming languages, this would require using loops and IF commands,
But in Pandas all it takes is one simple line! Just like that...

In [None]:
df[df["Age"]>40]

**How to read this?** We choose to "see" the rows of df such that df["Age"] (or df.Age, another syntax) is greater than 40. The condition is within the square brackets.

### Multiple conditions
Say I want to see individuals whose age is over 40 that **also** lost more than 20kg during the treatment. 

We begin by writing the condition just as before - but to enable additional conditions, **we place each condition within parentheses**, and add the ampersand sign. Note that the syntax is slightly different than that of Python, where the word 'and' is used. 

In [None]:
df[(df["Age"]>40) & (df['Weight_before']-df['Weight_after']>20)]

## How to create a new dataframe?

We start by creating a Dictionary object. In our case, we will call it 'raw_data'. Dictionaries consist of pairs of keys and values. We will use column names as keys, and column contents as values. For example, if we wanted to create our good old df from scratch, we would have used as a first key the string 'sample_num'. The associated value would be a list containing 1 to 10. Note that the keys are always strings - the names of the columns. 

Let's create a new, simple, dataframe, with the columns 'col_one', 'col_two', and 'col_three':

In [None]:
raw_data = {'col_one': ['A', 'B', 'C', 'D', 'E'], 
            'col_two': [10, 20, 30, 40, 50], 
            'col_three': ['Red', 'White', 'Pink', 'Green', 'Blue']}


Now that we have a dictionary, we can create a dataframe:

In [None]:
new_df = pd.DataFrame(raw_data)
new_df

### Creating a subset datafram
What if we want to get rid of some columns, and produce a DataFrame with only some of the columns?
We can achieve that without altering the dictionary raw_data, by adding the parameter 'columns' when calling the function DataFrame. For example, let's create a dataframe with col_one and col_three:

In [None]:
new_df2 = pd.DataFrame(raw_data, columns = ['col_one', 'col_three'])
new_df2

Note that we can also create a new dataframe based on the "old" dataframe. For example:

In [None]:
new_df3 = new_df.loc[:,('col_one','col_three')]
new_df3

### Adding to a dataframe
In order to **add a new row/record** to an existing DataFrame, we will use the function **append**, with a parameter that is actually a temporary dictionary, again with column names as keys. Let's try this:

In [None]:
df = df.append({"First_name":"Adam","Last_name":"Cohen","Age":17},ignore_index=True)
df


Note that since we did not provide values for all the columns, the new line contains NaN values.

How about **adding a column**? Let's add a colum of Gender. To do so, all we need to do is initialize the new column. For example: 


In [None]:
df['Gender'] = ['M', 'F', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M']
df

## Copy by reference

Python uses 'Copy by reference', and the resuls of some operations might surprise you due to this fact. Here is an example:

In [None]:
new_df_v2 = new_df
new_df.iloc[1,2] = 'Surprise!!!!!'
new_df

In [None]:
new_df_v2

Even though we copied new_df to new_df_v2, changes to one of these two dataframes result in changes to the other due to the 'Copy by reference'. In order to obtain a truely new and indenendent copy, we use the method copy: 

In [None]:
new_df_v2 = new_df.copy()

## Summary

We hope that by now you are starting to see the coolness of Python and Pandas, not to mention Jupyter notebooks :-)  We encourage you to try these and other Python and Pandas operations within this notebook or a new one. 