The ability to learn from data defines our scientific endevour. Data hides useful information, whether that information be in spreadsheets, images, sounds, and even the written word. Modern tools are making extracting knowledge from data ever more powerful, and perhaps more importantly, easier.

A quick YouTube search will show an endless supply of tutorials on data science and Python. Along with Google, Stackoverflow, Kaggle and Towards Data Science.

Resources for all produced material.

* https://www.kaggle.com/learn/pandas
* https://docs.python.org/3/tutorial/index.html
* https://devdocs.io/pandas~0.25/reference/api/pandas.dataframe

# Lists

## Create a list

A list in Python is created using a set of square brackets. The elements are all seprarated by a comma. You can have integer and string 'character' data types.

The positions of the elements within a list start from 0 'zero', [0 |1 |2 |3 |...]. So in a list with elements [a,b,c,d,e,f], a has the zero'th position, 'b' has the first position, c has the second position, etc.

In [161]:
my_list = [1,2,3,4,5,6] 

In [162]:
# list index, get the first value within the list
print(my_list[0])

1


In [163]:
# get the last value from the list, using negative indexing 
my_list[-1]

6

In [173]:
# deleting the 1st item in a list
## we can also use pop(), remove() to delete elements in a list 
del my_list[0]
my_list

[3, 4, 5, 6]

In [165]:
# Use the length function to get the number of elements in the list
len(my_list)

5

In [67]:
# add an integer 7 to the list
# # we use append() to add an element to our list. In this case we will include the exacty value of the element we want to add.
my_list.append(7)
my_list

[2, 3, 4, 5, 6, 7]

# Exercise

#### Question 1
Create a list called list1 andassign the digits 0,..9 to it. 

In [177]:
list1 = [0,1,2,3,4,5,6,7,8,9]

#### Question 2
Output the first element in the list.

In [178]:
list1[0]

0

#### Question 3
Output the last element in the list.

In [179]:
list1[-1]

9

#### Question 4
Delete the value 3 from your list.

In [180]:
del list1[3]
list1

[0, 1, 2, 4, 5, 6, 7, 8, 9]

#### Question 5
Add the value 13 in your list.

In [181]:
list1.append(13)

In [182]:
## Just checking to see if 13 is added to list1
list1

[0, 1, 2, 4, 5, 6, 7, 8, 9, 13]

### There are countless other operations you can perform on lists.

#

#

#

#

#

#

# Tuples 

## Create a Tuple
Tuples are much like lists. The elements are immutable, though. This means that they cannot be changed once created.


In [13]:
my_tuple = (3, 5, 'two')
my_tuple

(3, 5, 'two')

In [14]:
## call the last element in the tuple
my_tuple[-1] 

'two'

In [15]:
one, two, three = my_tuple 

In [184]:
three

'two'

This is like saying:
* one = my_tuple[0]
* two = my_tuple[1]
* three = my_tuple[2]

### There are other operations you can perform on tuples

#

#

#

#

#

#

# Dictionary

Dictionaries are collections where each element is a key-value pair. Every value has a key (a name). Python uses curly braces to indicate dictionaries.

#### Syntax

my_dict = {"key":"value"}

In [57]:
# create a dictionary of key-value pairs
my_dict = {'Language':'Python',
           'Version':3,
           'Environment':'Colab'}
my_dict

{'Language': 'Python', 'Version': 3, 'Environment': 'Colab'}

In [54]:
## DICTIONARY = key : value
print(my_dict.keys())
print(my_dict.values())

dict_keys(['Language', 'Version', 'Environment'])
dict_values(['Python', 3, 'Colab'])


In [55]:
# Retrieve keys and values
my_dict.items()

dict_items([('Language', 'Python'), ('Version', 3), ('Environment', 'Colab')])

In [56]:
# Value of Version key
my_dict.get('Version')

3

Instead of curly braces, the alternative syntax uses a list of tuples as a list and the latter as argument to the dict function

In [58]:
# Alternative syntax with list of tuples
my_other_dict = dict([('Language', 'Python'),
                      ('Version', 3.8),
                      ('Environment', ['Spyder', 'Notebook'])])
my_other_dict

{'Language': 'Python', 'Version': 3.8, 'Environment': ['Spyder', 'Notebook']}

# Exercise

#### Question 1
Create a dictionary called dict_ab. The keys should be (Name, School, Course) and the values should be ( your_name, Name_of_University, Your_course). E.g Name_of_University = SPU 

In [190]:
dict_ab = {'Name': 'Hanna',
            'School': 'SPU',
            'Course': 'Data Science'}

dict_ab

{'Name': 'Hanna', 'School': 'SPU', 'Course': 'Data Science'}

### OR

In [191]:
dict_ab = dict([("name",'Hanna'),
                 ('School','Spu'),
                 ('Course','Data_Science')])
dict_ab

{'name': 'Hanna', 'School': 'Spu', 'Course': 'Data_Science'}

#### Question 2
Output your values and keys of your dictionary below each other. 

In [192]:
print(dict_ab.keys())
print(dict_ab.values())

dict_keys(['name', 'School', 'Course'])
dict_values(['Hanna', 'Spu', 'Data_Science'])


#

#

#

#

#

#


# Arrays 

Think of an array as a nested list or a list in a list. 

In [17]:
ab = [[1,2,3],[4,5,6]] 
ab

[[1, 2, 3], [4, 5, 6]]

Arrays are python lists and we will be using the Numpy library to create our arrays. We do this with a namespace abbreviation , np. We generate arrays with the function array() where the elements are passed as a list.

# Numpy Library

In [18]:
# Make sure that you have installed your numpy library
import numpy as np

In [20]:
## Create an array called ab, with two lists inside.
ab = np.array([[1,2,3],[4,5,6]])
ab

array([[1, 2, 3],
       [4, 5, 6]])

In [21]:
# the sum of your array
ab.sum()

21

In [22]:
## think of it as the number of lists in your array
len(ab)

2

In [23]:
## the n dimensional array 
type(ab)

numpy.ndarray

In [24]:
# the shape of the array, we have you 2 rows and 3 columns 
ab.shape

(2, 3)

### Joining Array

In [197]:
## vstack: vertical stack 
n1 = np.array([1,2,5,4])
n2 = np.array([5,6,8,9])
np.vstack((n1,n2))

array([[1, 2, 5, 4],
       [5, 6, 8, 9]])

In [195]:
## hstack: horizontal stack 
np.hstack((n1,n2))

array([1, 2, 5, 4, 5, 6, 8, 9])

In [35]:
## column_stack: stack arrays into separate columns 
np.column_stack((n1,n2))

array([[1, 5],
       [2, 6],
       [5, 8],
       [4, 9]])

In [36]:
## similar value in the array or 2 lists.
np.intersect1d(n1,n2)

array([5])

# Exercise

### Question 1
Create an array called 'arr1' and 'arr2'. Input the values 1,2,3,4,5 for first array and 6,7,8,9,10 for second array.

In [199]:
arr1 = np.array([1,2,3,4,5])
arr2 = np.array([6,7,8,9,10])
print(arr1)
print(arr2)

[1 2 3 4 5]
[ 6  7  8  9 10]


In [200]:
arr1

array([1, 2, 3, 4, 5])

In [201]:
arr2

array([ 6,  7,  8,  9, 10])

### Question 2
Get the vertical stack of the two arrays

In [202]:
np.vstack((arr1,arr2))

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

### Question 3
Get a common value between the arrays 

In [203]:
np.intersect1d(arr1,arr2)

array([], dtype=int32)

The reason we get an empty list is because there is no common element in both lists arr1 and arr2. 

### You can get the sum, mean, standard deviation of the array or the lists. You can also treat the array as a matrix and perform matrix operations on it.

#

#

#

#

#

#

# Pandas Library

There are two objects in Pandas: Series and DataFrame. 

In [37]:
## ensure that you have installed the pandas library
import pandas as pd

# Series 

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list. 

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name or an index.

In [51]:
## the default index that is created by pandas on the far left and this is a series created using a list
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [204]:
## We have assigned the index of our Series with the entries mentioned in the code and each entry corresponds with the position of the index
s1 = pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'])
s1

2015 Sales    30
2016 Sales    35
2017 Sales    40
dtype: int64

In [53]:
## the type is pandas series 
type(s1)

pandas.core.series.Series

In [68]:
# Create a series from a dictionary
d1 = ({'a':10, 'b':20, 'c':40, 'd':45})
d1

{'a': 10, 'b': 20, 'c': 40, 'd': 45}

In [69]:
pd.Series(d1)

a    10
b    20
c    40
d    45
dtype: int64

In [70]:
## change index position 
pd.Series(d1, index = ['c','b','a','d'])

c    40
b    20
a    10
d    45
dtype: int64

As you can see, you can manually change the index or the positions of your series. The initially created series had an [a,b,c,d ] and the one changed follows the order - [c,b,a,d].

# Exercise

#### Question 1
Create a series from a dictionary. The series called d2 having values (15,20,45) and index of (First_year, Second_year, Third_year).

In [205]:
d2 = ({'First_year':15,'Second_year':20,'Third_year':45})
d2

{'First_year': 15, 'Second_year': 20, 'Third_year': 45}

In [206]:
## Change the dictionary to a series
pd.Series(d2)

First_year     15
Second_year    20
Third_year     45
dtype: int64

### You can perform the same operations on a Series as you would on a list.

#

#

#

#

#

#

#  DataFrame

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [71]:
## Example
pd.DataFrame({'Yes':[50,21],'No':[131,2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In this example, the "0, No" entry has the value of 131. The "0, Yes" entry has a value of 50, and so on.

In [72]:
## Here's a DataFrame whose values are strings:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the pd.DataFrame() constructor to generate these DataFrame objects.The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries.

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor:

In [73]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together".

I hope you noticed how a DataFrame is nothing more then a dictionary. 

# Exercise

Create a dataframe called fruits. With columns (Apples, Bananas), indexed as '2017 Sales' and '2018 Sales' rows (35,41) and (21,34) respectively.

In [208]:
fruits = pd.DataFrame({'Apple': [35,41],'Bananas':[21,34]}, index =['2017 Sales','2018 Sales'])
fruits

Unnamed: 0,Apple,Bananas
2017 Sales,35,21
2018 Sales,41,34


#

#

#

#

#

#

# See you next week!!