## Introduction to Colab

Colab is a simple online python notebook environment that comes with lots of packages preinstalled.

In [None]:
# Use the Play (▷) button next to each cell to run that cell, 
# The (↑↓) buttons to change the order of cells, 
# And the 

print('Welcome to Information Retrieval labs !')

In [None]:
## To allow filesharing you can connect Colab with your Google Drive files by running the code below 
# Follow the instructions in the pop up window to authorise access. 

from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
## [CODING TIP] Colab Shortcuts

# Shift + Enter to run a cell 
# B to add a new cell below 
# A to add a new cell above

# You can discover more by going to Tools > Keyboard Shortcuts

## Python Recap

We'll be using several common Python packages and inbuilt functions in this module. Lets review them. 

In [None]:
## For loops and If statements

for i in range(10):    # iterate over a range of 5.. i.e i=0,1,2,3,4
  if i % 2 == 0:       # if i is even, print i
    print(i)

In [None]:
## [CODING TIP] This can also be written more concisely using list comprehension
x = [print(i) for i in range(10) if i%2==0]

In [None]:
## Functions: These are particularly useful if you want to repeat a process multiple times

def multiply (a, b): # this function takes two elements, and multiplies them.  
  return a*b

multiply(3,4)

In [None]:
## What is a set? A set is a unordered collection of unique items

# Note how the output set removes any duplicates
print(set([2,2,3,4]))   

# The Intersection (&) of two sets is a set of items that are common to both sets
print(set([2,3,4]) & set([2]))

# The Union (|) of two sets is a set of all elements from both sets
print(set([2,3,4]) | set([2]))

In [None]:
## Pandas is a package used to work with structural or tabular data. 

import pandas as pd

# Lets start by reading a csv file containing information about california housing to a pandas' DataFrame. 
df = pd.read_csv('/content/sample_data/california_housing_train.csv')

In [None]:
# display dataframe
df

In [None]:
# Exploring some dataframe functions
print('Dimensions of dataframe:', df.shape)
print('\nDataframe attribute (column) names:\n', df.columns)
print('\nBasic statistics about dataframe:\n', df.describe())

In [None]:
# Column data can be selected using the following syntax
print('Values for the DataFrame attribute `total rooms`:\n', df['total_rooms'])

In [None]:
# A more advanced technique to select column and row data is to use the `iloc` and `loc` functions:
# The syntax for 'loc' is as follows: df.loc[row name,column name]
print(df.loc[1, 'longitude'])
print(df.loc[16999, 'population'])


In [None]:
# ':' can be used to return all data for that row or column
print(df.loc[:, 'latitude'])        

In [None]:
# 'iloc' works the same way but we use the index numbers for columns/rows rather than the names
# e.g df.iloc[row index, column index]
print(df.iloc[0,0])   # selecting data from the first column and first row
print(df.iloc[4,:])   # selecting all data from fifth row. 

In [None]:
## [CODING TIP] Official documentation is your friend. 
# e.g Not sure how to use a pandas dataframe function?  Look up the documentation: https://pandas.pydata.org/docs/reference/frame.html

In [None]:
## For basic math functions, numpy is an excellent package

import numpy as np

In [None]:
# An array is a python data structure, like a list, but with additional functionality
x = np.array([2,-10,2,-7,5])

print('Mean of array: ',np.mean(x))
print('Minimum value in array: ',np.min(x))
print('Absolute values of array: ',np.abs(x))
print('Dimensions of array: ',np.shape(x))

In [None]:
## [CODING TIP] To iterate through a numpy array or pandas dataframe, it is possible to use for loops. 
# However, these can be inefficient especially with large datasets 
# The preferred method is to use inbuilt functions which enable *vectorization*

## Bash Commands

In [None]:
# Bash commands are usually reserved for the terminal or command line, however you can use some simple ones in Colab

# e.g To install a new package that does not come preinstalled in Colab
!pip install nbconvert

In [None]:
# Or to download an internet file into colab 
!wget https://i.ytimg.com/vi/RJIABsgnTHk/maxresdefault.jpg

## If you finish early.. 

In [None]:
# QUESTION 1: 
# Write a function that takes a list of numbers and returns pairs of number from the list which add up to 10?
# Test your function with some dummy inputs

In [None]:
# QUESTION 2:
# Write a function that takes our previous housing df as input, 
# selects data entries with households greater than 450, 
# and returns the mean population of that selection
# Test your function with some dummy inputs