# Python Data Structures

![Python Logo](https://www.python.org/static/community_logos/python-logo-master-v3-TM.png)

The starting point in considering analysing data is to consider how python will handle your data. We will start by looking at native python data structure (that is data structures that are available by default with the language and do not require additional libraries.

In this part we will look at the core data structures in standard python and how to work with them. We will then also start looking at types of data.

## Lists

A list is a collection of items that can be iterated through. There are several ways to create a list.

The first is to manually define a list:

In [1]:
my_list = [1,2,3,4,5]
my_list

[1, 2, 3, 4, 5]

you can also generate a list from a function. For example range generates a sequence of numbers

In [2]:
list(range(2,12,3))

[2, 5, 8, 11]

Lists can contain objects of different types, including sublists:

In [3]:
my_list = [1,'a',3.0, [4, 5.0]]
my_list

[1, 'a', 3.0, [4, 5.0]]

Items in a list can be referenced, changed, added or removed.

In [4]:
my_list[2]

3.0

In [5]:
my_list[2] = 4.0
my_list

[1, 'a', 4.0, [4, 5.0]]

In [6]:
my_list.append(6.0)
my_list

[1, 'a', 4.0, [4, 5.0], 6.0]

In [7]:
my_list.remove(6.0)
my_list

[1, 'a', 4.0, [4, 5.0]]

Operations can be performed on whole lists

In [8]:
big_list = my_list + [ 7,5,'f']
big_list

[1, 'a', 4.0, [4, 5.0], 7, 5, 'f']

In [9]:
['a', 'b'] * 5

['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b']

A key concept in python is an item can be *iterable*, this mean than we do they same action to each item in the collecton represented by the item using reptition functionality like a for loop. A list is a basic iterable.

In [10]:
for my_item in my_list:
    print(my_item)

1
a
4.0
[4, 5.0]


## Tuple
A similar concept is a tuple. It is created using round brackets rather than square brackets.

In [11]:
my_tuple = (1,2,3)

The difference between lists and tuples is that a list can be changed after it is created e.g. adding extra items (this is called *utable*), while tuples cannot be changed after being created (*immutable*).

In [12]:
my_tuple[1]

2

In [13]:
my_tuple[1] = 4

TypeError: 'tuple' object does not support item assignment

## Dictionary
Another convenient way to store data in python is to associate each value with a key. For example you might store the population of a country with the country name.

In [14]:
population_data = {'United Kingdom': 66.6,
                   'France': 66.9,
                   'Germany': 83.02,
                   'Spain': 46.94,
                   'Italy': 60.36,
                   'Portugal': 10.28,
                   'Netherlands': 17.28,
                   'Belgium': 11.46,
                  }

In [15]:
population_data

{'United Kingdom': 66.6,
 'France': 66.9,
 'Germany': 83.02,
 'Spain': 46.94,
 'Italy': 60.36,
 'Portugal': 10.28,
 'Netherlands': 17.28,
 'Belgium': 11.46}

In [16]:
population_data['France']

66.9

In [17]:
population_data['Switzerland'] = 8.55

In [18]:
for country_name, population in population_data.items():
    print(f'The population of {country_name} is {population:.2f} millions.')

The population of United Kingdom is 66.60 millions.
The population of France is 66.90 millions.
The population of Germany is 83.02 millions.
The population of Spain is 46.94 millions.
The population of Italy is 60.36 millions.
The population of Portugal is 10.28 millions.
The population of Netherlands is 17.28 millions.
The population of Belgium is 11.46 millions.
The population of Switzerland is 8.55 millions.


## Key data types
There are two core types of data that are common used in Data Science:
* **Tabular Data** - this is data in rows and columns, as you would expect to see in a spreadsheet. 
  * Each row represents a data point e.g. e.g. each employee in a payroll system; each measurement time in a weather observation system
  * Each column represent a feature of the data point e.g. name, employee ID, department, salary in a payroll system; temperature, wind, sunshine in a weather observation system
* **Gridded Data** - A multi-dimensional array of data, reperesenting a regular grid of measurements.
  * This might the output of a weather forecast model over the UK. 
  * Dimensions could be latitude, longitude and time.


### Reading and writing data

Python's batteries included philosophy includes many options for reading/writing data without having to import thrid part libraries.


Lets start by reading and writing example data from a comma-separated value or CSV file. These are basic storage format like a simple spreadsheet.

https://docs.python.org/3/library/csv.html

In [19]:
import csv

In [20]:
with open('weather.csv') as file_in1:
    weather_reader1 = csv.reader(file_in1, delimiter=',')
    for measurement_row in weather_reader1:
        print(measurement_row)

['time', 'temperature', 'wind', 'rain']
['1400', '21', '2', '0']
['1500', '24', '5', '0']
['1600', '20', '15', '1']
['1700', '22', '5', '15']
['1800', '21', '1', '0']


In [21]:

with open('weather.csv') as file_in1:
    weather_data1 = csv.DictReader(file_in1, delimiter=',')
    weather_dict = {measurement_row['time'] : measurement_row for measurement_row in weather_data1}

In [22]:
weather_dict

{'1400': {'time': '1400', 'temperature': '21', 'wind': '2', 'rain': '0'},
 '1500': {'time': '1500', 'temperature': '24', 'wind': '5', 'rain': '0'},
 '1600': {'time': '1600', 'temperature': '20', 'wind': '15', 'rain': '1'},
 '1700': {'time': '1700', 'temperature': '22', 'wind': '5', 'rain': '15'},
 '1800': {'time': '1800', 'temperature': '21', 'wind': '1', 'rain': '0'}}

In [23]:
weather_dict['1600']['temperature']

'20'

We can see we are now starting to get our data organised using pythons basic data structures and libraries.

In our next notebook we will look at loading gridded data using a third-party library.