# GETTING STARTED

*Text cells* are just comments on the code, nothing happens because of them. To add a *code cell* click the `+ Code` symbol on the top. 

In [None]:
var_a = 2

Programming is a series of **COMMANDS** which are executed in a certain order. In these Notebooks we execute all the commands in a text cell by pressing the Play button on the left or clicking (shift + enter)

In [None]:
var_b = 3
var_c = var_a + var_b

Our first code cell had one command: create a new *variable* called `var_a` and set it equal to 2. 
> A variable is just something that *holds data*, it has a name, in this case `var_a`. Use descriptive names for your variables please. Names are:


*   Case sensitive (`var_a` is different from `Var_A`)
*   Can only have letters, numbers and `_`
*Cannot start with a number




Our second code cell had 2 commands, and we expect the variable `var_c` to have a value of 5. 

In [None]:
print(var_c)

5


In the "old days" we would have to use a `print()` command to reveal the value of a variable (thus far its all been hidden) but in notebooks there's a new way called **INSPECTION**

In [None]:
var_c

5

Just by putting the variable `c` in an *empty code cell* we can immediately see its value. Note that no commands are made in the code cell above, so we can run it as many times as we want, inspecting different variables. 

# TYPES OF DATA

Variables hold data, it's important to know what type (integer, string etc) of data is being held, four common types we will use for now are: 
![alt text](https://i.imgur.com/4sIlRgP.png)

You can see what data type a variable is by using the `type()` function. 

In [None]:
type(var_a)

int

The above line is actually still an inspection, we aren't inspecting any varibles directly, but we are inspecting what `type(var_a)` *would have been*. This will be clear later, but since its an inspection, we can re-run that cell!

In [None]:
var_int = 42
var_float = 2.1
var_string = 'hello world'
var_list = [1,3,5]

We declare a few more variables, feel free to make your own!

In [None]:
var_list

[1, 3, 5]

The last thing we haven't seen yet are functions, denoted by parentheses `( )`. Many data types have built in functions, which can be used by all variables of that type.

In [None]:
var_newlist = var_list.copy()

In [None]:
var_newlist

[1, 3, 5]

eg. `lists` have a function called `.copy()` which makes a duplicated of this list, that we stored in another variable `var_newlist`. The benefit here is that changes in `var_newlist` will not affect `var_list`. We will NEVER do this:
```
var_newlist = var_list
```
Since changes to either variable will affect the other, making it confusing.


In [None]:
## MANDATORY EXERCISE: Make a integer, float, string and list variable
## then make a copy of the list variable, using the copy FUNCTION under a new name!

# DO NOT RUN CELLS OUT OF ORDER!!

In [None]:
var_a = 5

In [None]:
var_b = var_a + 2

In [None]:
var_b

4

We expect b to be 7 right? NO! Because we ran cell `b = a + 2` before running `a = 5` the command used the old value of `a`. The [numbers] besides the cell tells us about the order. 

Notebooks are meant to read **TOP-DOWN** therefore go to 
> Runtime - > Restart and Run all


This will remove all variables and runs every code cell top to bottom. **THIS IS A COMMON MISTAKE AND THE SOURCE OF A LOT OF FRUSTRATION** as you will see later, *Restart and Run all* often to avoid this headache. 
> *Restart and Run all* is DIFFERENT FROM *Run all*

# First Dataframe!

Lists are one way of storing data, but we will be using **pandas dataframes**. *Pandas* is just short for panels data, dataframes are usually written as `df`. They are not native to python, but in a library (of elaborate python code) written by the community which we can use by simply `import` statement.

In [None]:
import pandas as pd

We just imported `pd` and all the amazing functions with it (such as `.read_csv()`, `.DataFrame()` ) for us to use

In [None]:
df_sample = pd.read_csv('/content/sample_data/california_housing_train.csv')

Here we make a dataframe based on a csv file (a common way of storing data, excel and google sheets can read it too, learn more about csv files [here](https://www.howtogeek.com/348960/what-is-a-csv-file-and-how-do-i-open-it/).
As we inspect we can see there's over 17000 rows of data and 9 columns, don't be afraid, pandas makes it very easy for us as we will soon see. 

In [None]:
df_sample

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.9250,65500.0
...,...,...,...,...,...,...,...,...,...
16995,-124.26,40.58,52.0,2217.0,394.0,907.0,369.0,2.3571,111400.0
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0
16998,-124.30,41.80,19.0,2672.0,552.0,1298.0,478.0,1.9797,85800.0


If you were wondering where the file came from, click the file folder symbol on the left of Google Colab. 
> *Folder symbol* -> sample_data -> *RIGHT CLICK* california_housing_train.csv -> Copy path

(the `.read_csv()` function requires a filepath, which we paste into the function)