This notebook is a one-stop shop of all the key Python code syntax from Module 1. It's a living document through the Python Foundations course.

The purpose is so you have this as a reference (for copy-pasting essentially) whenever you are stuck with writing code yourself.

As an exercise, read through each code block and just try to understand what is happening with each piece of code. Do not try to memorize the code. Just try to become familiar with the syntax.

## Table Of Contents:

* [Data Types](#bullet-1)
** [Numeric](#bullet-1.1)
** [String](#bullet-1.2)
** [Boolean](#bullet-1.3)
** [Data Type Conversions](#bullet-1.4)
* [Data Collection Types (includes indexing, slicing and looping)](#bullet-2)
** [Tuples and Lists](#bullet-2.1)
** [Sets](#bullet-2.2)
** [Dictionaries](#bullet-2.3)
* [Functions](#bullet-3)
* [Pandas Dataframe - to be updated](#bullet-4)

# Data Types <a class="anchor" id="bullet-1"></a>

Data belongs to different categories. It can be numeric, text, true/false, complex numbers, date, time etc. Data should be stored correctly for us to be able to do the desired operations on the data. If the data is stored incorrectly, for example numerical data is stored as text data, we can not do numerical or arithmetic operations on the data because the methods and the operations for a data depend largely on the type of data (object) being referred to.

There are various data types in Python. Listed below are the most prominently used ones:
1. Numeric - This is further divided into 
    (a.) Integer
    (b.) Float
    
2. String

3. Boolean 

### Numeric Data Type <a class="anchor" id="bullet-1.1"></a>

As the name suggests, Numeric data type consists of numbers. These numbers can be whole numbers, decimal numbers or complex numbers.

Whole numbers are stored in **Integer** (Int) type.

Decimal numbers are stored in **Float** (Float64) type.

You can check the data type of an object using the type() command.

In [2]:
#Let's look at some examples.

Integer = 7
type(Integer)

int

In [3]:
3*3

9

In [8]:
mean_heights_sydney = 54
mean_heights_adelaide = 23

(mean_heights_sydney + mean_heights_adelaide)/2

38.5

In [74]:
def fi(x):
    '''this is a function that takes in a value of x (an int) and retuns a string'''
    
    return str(x)

In [2]:
Float = 2.356
type(Float)

float

In [3]:
Complex = 3.908J
type(Complex)

complex

In [76]:
#first run the def fi(x)... code block. This way, the function fi is stored in your kernel session
#so when you say fi (like below), it knows you are referring to the function defined above
fi(2)

#notice the output is in quotes. Do you recognize this data type? If not, see below

'2'

### String Data Type <a class="anchor" id="bullet-1.2"></a>

String Data type usually is used to store text. The data to be stored in this data type is enclosed between single ('') or double ("") quotes.
Recall that you printed your name in the previous Notebook. That was string data type.
Let's look at an example.

In [2]:
# Printing your name

My_Name = "My name is Jupyter!"
My_Name

'My name is Jupyter!'

In [79]:
# works with single quotes as well. No difference from above.

My_Name = 'My name is Jupyter!'
My_Name

'My name is Jupyter!'

In [6]:
type(My_Name)

str

Various functions can be performed using strings like searching within a string, conversion to lowercase/uppercase, count, length, splitting, replacing, trimming, partitioning etc.

In [8]:
# Let's check how many characters does your name contain.
len(My_Name)

19

In [20]:
# Let's see if My_Name is all Caps or not. 
# isupper() returns True if all the letters are Capitals, False if atleast one letter in in lower case.

My_Name.isupper()

False

In [23]:
# Let's convert My_Name to all caps.

My_Name = My_Name.upper()
My_Name

'MY NAME IS JUPYTER!'

In [3]:
# Now the output of isupper() changes to True.
My_Name.isupper()

False

In [4]:
My_Name[0]

'M'

In [5]:
My_Name[1]

'y'

In [6]:
My_Name[0:2]

'My'

In [7]:
My_Name[0:4]

'My n'

### Boolean Data Type <a class="anchor" id="bullet-1.3"></a>

The boolean data type has just two values, i.e., True or False.

In [81]:
# Let's look at examples.

x = True
y = False

In [82]:
type(x)

bool

In [83]:
type(y)

bool

In [90]:
if x == True:
    print("Ok, makes sense")
else:
    print("this is wierd")

Ok, makes sense


In [91]:
#for bool variables, true = 1, false = 0
if x == 1:
    print("True is also 1")

True is also 1


### Data Type Conversions <a class="anchor" id="bullet-1.4"></a>

As explained earlier, data should be stored in correct form so that it can be manipulated efficiently later.
Quite a lot of times, data is not stored correctly or it gets imported incorrectly.
In such cases, data needs to be converted to its correct type so that it can be optimally used in our analysis.
Thus, let's look at data type conversion exercises to get you comfortable with the process.

In [37]:
# Consider an object containing an integer.
string1 = 1
# I want to use the value 1 as a text value. So let's convert it.
string1 = str(string1)
type(string1)

In [45]:
# Let's do the opposite now. Let's learn to convert string to float now.
x = '3'
x = float("2")
type(x)

float

In [53]:
int1 = "500"
type(int1)

str

In [55]:
int1 = int(int1)
type(int1)

int

## Data Collection Types (with looping, indexing and slicing) <a class="anchor" id="bullet-2"></a>

### Tuples and Lists <a class="anchor" id="bullet-2.1"></a>

Tuples are defined with (). Lists are defined with [].

Both accept any data type (i.e. float, string, int etc)

In [43]:
random_tuple = ("da", 32,5.3, [3,4]) #notice here I'm putting a list inside a tuple!
random_tuple

('da', 32, 5.3, [3, 4])

In [44]:
tuple_months = ('January','February','March','April','May','June',\
'July','August','September','October','November','December')

In [21]:
tuple_months

('January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December')

In [22]:
tuple_months[3]

'April'

In [47]:
list_cats = ['Tom', 'Snappy', 'Kitty', 'Jessie', 'Chester', ['Beck', "Dairy"]]

In [48]:
list_cats

['Tom', 'Snappy', 'Kitty', 'Jessie', 'Chester', ['Beck', 'Dairy']]

In [49]:
print(list_cats[2])

Kitty


In [50]:
print(list_cats[0])

Tom


In [51]:
print(list_cats[0:3])

['Tom', 'Snappy', 'Kitty']


In [52]:
list_cats.append('Catherine')

In [53]:
list_cats

['Tom', 'Snappy', 'Kitty', 'Jessie', 'Chester', ['Beck', 'Dairy'], 'Catherine']

In [54]:
del list_cats[1]

In [55]:
list_cats

['Tom', 'Kitty', 'Jessie', 'Chester', ['Beck', 'Dairy'], 'Catherine']

In [56]:
index_locator = 0
for cats in list_cats:
    if cats == "Kitty":
        print("Found the index! It is", index_locator)
        break #the break command ends the for loop. Hence, as soon as this condition is satisfied, the loop is exited. Even if the loop did not reach the end of the list
    else:
        index_locator = index_locator + 1 #add one to the variable

Found the index! It is 1


In [57]:
list_cats[index_locator]

'Kitty'

In [58]:
list_cats[1]

'Kitty'

In [60]:
list_cats[4]
#what is the data collection type of this?

['Beck', 'Dairy']

In [61]:
type(list_cats[4])

list

In [62]:
list_cats[4][0]

'Beck'

### Sets <a class="anchor" id="bullet-2.2"></a>

Sets are represented between curly brackets {}. Not to be confused with dictionaries which are also represented with curly brackets, but they have a : symbol between each key-value pair.

In [13]:
my_set = {1, 2, 3}

In [14]:
your_set = {4, 2, 5}

In [16]:
my_set | your_set

{1, 2, 3, 4, 5}

In [17]:
my_set & your_set

{2}

In [15]:
my_set - your_set

{1, 3}

### Dictionaries <a class="anchor" id="bullet-2.3"></a>

Dictionaries are represented as {key:value}

In [24]:
CO2_by_year = {1799:1, 1800:70, 1801:74, 1802:82, 1902:215630, 2002:1733297}

In [26]:
CO2_by_year[1801]

74

In [18]:
CO2_by_year = {1799:1, 1800:70, 1801:74, 1802:82, 1902:215630, 2002:1733297}

In [19]:
# Look up the emissions for the given year
CO2_by_year[1801]

74

In [20]:
# Add another year to the dictionary
CO2_by_year[1950] = 734914

In [21]:
CO2_by_year

{1799: 1,
 1800: 70,
 1801: 74,
 1802: 82,
 1902: 215630,
 2002: 1733297,
 1950: 734914}

In [22]:
CO2_by_year[2009] = 1000000
CO2_by_year[2000] = 100000

In [23]:
1950 in CO2_by_year

True

In [24]:
len(CO2_by_year)

9

In [25]:
del CO2_by_year[1950]

In [26]:
len(CO2_by_year)

8

In [27]:
for key in CO2_by_year:
    print(key)

1799
1800
1801
1802
1902
2002
2009
2000


In [28]:
for k in CO2_by_year.keys():
    print(k)

1799
1800
1801
1802
1902
2002
2009
2000


In [29]:
for v in CO2_by_year.values():
    print(v)

1
70
74
82
215630
1733297
1000000
100000


In [30]:
CO2_by_year.values()

dict_values([1, 70, 74, 82, 215630, 1733297, 1000000, 100000])

In [31]:
for key, value in CO2_by_year.items():
    print(key, value)

1799 1
1800 70
1801 74
1802 82
1902 215630
2002 1733297
2009 1000000
2000 100000


# Functions <a class="anchor" id="bullet-3"></a>

In [64]:
#You can name the function anything. In this case, we name it convert_to_celsius
def convert_to_celsius(fahrenheit):
    ''' (number) -> number
    Return the celsius degrees equivalent to
    fahrenheit degrees.
    '''
    celsius = (fahrenheit - 32) * 5 / 9
    return celsius #this is the returned output by the function

In [65]:
convert_to_celsius(32)

0.0

In [95]:
#if you give variable type of string, of course, it wouldn't work. 
# Uncomment the code below by removing the hash # sign. Then run. 
# now when you see a TypeError error message, you'll know what it means.



# convert_to_celsius('3')

In [66]:
convert_to_celsius(212)

100.0

In [67]:
convert_to_celsius(-40)

-40.0

In [68]:
#You can store the value returned back by the function into a variable

returned_value_variable = convert_to_celsius(-40)
returned_value_variable

-40.0

# Pandas Dataframe (in progress) <a class="anchor" id="bullet-4"></a>

In [98]:
#make sure you import the pandas library in your notebook
import pandas as pd 
import seaborn as sns

In [100]:
data = sns.load_dataset("iris")

In [107]:
#the pandas dataframe is one of the most powerful tabular data processing tools out there for data science
type(data)

pandas.core.frame.DataFrame

In [104]:
#check out the pandas head documentation here https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html
#notice how in the paramaters, it expects an int type variable. This is why knowing the data types is important.
# if you don't give it an int, the .head method will throw an error.


#defaults to first five rows. But you can give another int variable inside the brackets
#you could also just say data.head(2), but I've followed the documentation exactly for clarity.
data.head(n=2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa


In [33]:
# Looking at the last few rows of the data frame
#can you find the pandas official documentation for the tail method? What parameters/arguments does it expect? 
data.tail(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


In [30]:
#number of rows,columns
#don't know the sytax for an tabular operation? Google the question along with "pandas dataframe", and you'll likely find the syntax on stackoverflow 
# for example here: https://stackoverflow.com/questions/13921647/python-dimension-of-data-frame

data.shape

(150, 5)

In [105]:
#a dataframe is comprised of columns that are called Series. Series is a data collection type.

type(data["sepal_length"])

pandas.core.series.Series

In [110]:
#as you've learned, each column as a type. 
# "object" = string. You've seen the float type before.
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [108]:
#the series object comes with some methods.
# check out all the available methods here: https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html

data["sepal_length"].mean()


5.843333333333335

In [111]:
data["species"].value_counts()

versicolor    50
setosa        50
virginica     50
Name: species, dtype: int64

In [112]:
#there are three unique values in column species
data["species"].nunique()

3

In [36]:
data["sepal_width"][3]

3.1

In [12]:
#indexable
data["sepal_length"][0]

5.1

In [13]:
#mutable. change first value in the pandas Series.
data["sepal_length"][0]

5.1

In [14]:
data.head(1)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa


In [15]:
data["sepal_length"][0]

5.1

In [115]:
data[["sepal_length", "sepal_width"]].head(2)

Unnamed: 0,sepal_length,sepal_width
0,5.1,3.5
1,4.9,3.0


In [16]:
#you can access column names with dot as well. But my advise is to always use the data["col_name"] notaltion, because it also applies to the general case of slicing more than one column data["col_name_1", "col_name_2" ...]
data.sepal_length.mean()

5.843333333333335

In [17]:
#use single [] bracket to return series.
type(data["sepal_length"])

pandas.core.series.Series

In [117]:
#use double [[]] bracket to return dataframe.

type(data[["sepal_length"]])

pandas.core.frame.DataFrame

In [19]:
#which column is missing from here?
# it's because it is not a type numeric column/series
data.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


The dataset has 150 rows of observations and 5 columns.