# Welcome to the second Python for Ocean Sciences Seminar 

# The Basics and Importing Data 


Today we'll be learning some basic Python syntax, data types, how to read in some common file types, and understanding some common errors. In the session, we can hopefully do this live together... fingers crossed for no major technical failures! 

<img src="images/livetroubleshooting_AH.png" width="800" align="center">

*Artwork by @allison_horst*


<img src="images/codehero_AH.png" width="500" align="right">

**Troubleshooting** 

You will inevitably come across a lot of errors in your journey to learn Python. Luckily, Python and it's many packages are *open source* so there is a lot of support online. 

As a Python user your first port of call will probably be https://stackoverflow.com/, however most packages have well-documented online manuals and there are more specific forums e.g https://gis.stackexchange.com/ for GIS related questions. 

*Artwork by @allison_horst*


# Read in Python packages 

<img src="images/importmodules.png" width="500" align="right">

The Python language has limited, basic operations for data and research science. To expand its capability, we need to import some of the packages we installed last time. In Python scripts, all packages used in that particular script are imported at the start. Do this by running the code below. 

*Tip: You can give packages aliases/another name to simplify your code. You can use any name but there are some standard ones below*

In [1]:
import numpy as np # numpy is the package, np is the alias 
import pandas as pd

If you don't have the package installed in your *activated environment* Python will give you this error: 

```
ModuleNotFoundError: No module named 'numpy'
```

Python code is **case sensitive**. This means none of Numpy, NUMPY, NumPY will work if you want to be using 'numpy'!

*Reminder: To download the numpy Python package navigate to your activated environment in your Anaconda Prompt and type the following command...* 

``` conda install numpy ``` 

Now we've installed our package we can start coding in Python!

## Basic Arithmetic in Python

Basic arithmetic operations are built into the Python language and are very similar to other languages, e.g: 

- ```+``` for addition
- ```-``` for subtraction
- ```*``` for multiplication
- ```/``` for division
- ```%``` for modulus (returns the remainder)
- ```**``` for exponentiation


We can also output variables and other information to the workspace using ```print()```. For example:

In [2]:
a = 4
b = 5 

a+b
#a-b

# print(a+b) 
# print(a**b) 
# print('a/b = ' + a/b)
# print('a/b = ' + str(a/b))

# c = a+b # Unlike in MATLAB we don't need a semi-colon (;) at the end of a line... 
        #   to stop printing to screen.

9


Let's use some more advanced mathematical functions using the numpy library. Remember we imported numpy with the alias "np". You can now call specific functions within the numpy library by using the syntax ```np.```

In [3]:
# mathematical constants 
print(np.pi)
print(np.e)

# trigonometric functions 
angle = np.pi/4
print(np.sin(angle))
print(np.cos(angle))
print('tan(angle) = ' + str(np.tan(angle)))

3.141592653589793
2.718281828459045
0.7071067811865475
0.7071067811865476
tan(angle) = 0.9999999999999999


## Data Types and Structures in Python

Python has your usual data **types** (e.g strings, floats, integers etc) but also three very useful data **structures** built into the language. 

- dictionaries: { } 
- lists: [ ] 
- tuples: (item1,...)

*Tip: Check your data type with the command* ``` type() ```

In [4]:
# lets look at some data types 

string_1 = "hello" # you can call variables anything you want...
                   # as long as they don't interfere with functions (e.g type,np,tan etc)
print(string_1, type(string_1))

float_1 = 2.4
float_2 = float(2) # you can convert integers into floats
print(float_1, type(float_1))
print(float_2, type(float_2))

int_1 = 4
int_2 = int(4.2) # you can also convert floats into integers. It will round the number to the nearest integer. 
int_3 = np.floor(4.2) # this rounds the number down to the nearest integer but does not convert the data type.
print(int_1, type(int_1))
print(int_2, type(int_2))
print(int_3, type(int_3))

hello <class 'str'>
2.4 <class 'float'>
2.0 <class 'float'>
4 <class 'int'>
4 <class 'int'>
4.0 <class 'numpy.float64'>


In [5]:
# lets look at some data structures 

# A list is an editable list of items 

myList = [1, 2, 3]

# A tuple is a read-only data structure. 

myTuple = (1,2,3)

# A dictionary is a collection of "key":value pairs 

myDictionary = {"A":4,"B":6,"C":8}

print(myList,type(myList))
print(myTuple,type(myTuple))
print(myDictionary,type(myDictionary))

[1, 2, 3] <class 'list'>
(1, 2, 3) <class 'tuple'>
{'A': 4, 'B': 6, 'C': 8} <class 'dict'>


We've started to create a lot of variables. To keep track, and print to screen, use the ```%whos``` command. 

In [6]:
%whos

Variable       Type       Data/Info
-----------------------------------
a              int        4
angle          float      0.7853981633974483
b              int        5
float_1        float      2.4
float_2        float      2.0
int_1          int        4
int_2          int        4
int_3          float64    4.0
myDictionary   dict       n=3
myList         list       n=3
myTuple        tuple      n=3
np             module     <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
pd             module     <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
string_1       str        hello




## Defining Functions

You can define functions within your code in Python. They have to be defined *before* you use them in the code. Here's a very simple one as an example: 

In [7]:
def addition(a,b):
    c = a + b
    return c

In [8]:
result = addition(2,3)
print(result)

5


You can also output multiple variables from a function, and store them separately:

In [9]:
def addition(a,b):
    c = a + b
    return a,b,c

In [10]:
A, B, C = addition(2,3)

%whos

Variable       Type        Data/Info
------------------------------------
A              int         2
B              int         3
C              int         5
a              int         4
addition       function    <function addition at 0x7fcfdb69c3a0>
angle          float       0.7853981633974483
b              int         5
float_1        float       2.4
float_2        float       2.0
int_1          int         4
int_2          int         4
int_3          float64     4.0
myDictionary   dict        n=3
myList         list        n=3
myTuple        tuple       n=3
np             module      <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
pd             module      <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
result         int         5
string_1       str         hello


Here's another, using some boolean operators:

In [11]:
def is_it_three(a): 
    return a == 3

In [12]:
is_it_three(2)

False

In [13]:
def is_odd(num):
    if num % 2 != 0:
        result = str(num) + " is an odd number"
    else:
        result = str(num) + " is an even number"
    return result

In [14]:
is_odd(4)

'4 is an even number'

**Identation** 

Python uses indentation to identify blocks of code. Code within the same block should be at the same indentation level. A Python function is one type of code block (as is a for loop, if statement *etc.*). 

If the code isn't indented properly it might throw up an ```IndentationError``` or cause your analysis to be incorrect. See an example below. 


In [15]:
def is_it_two(a):
return a == 2 

IndentationError: expected an indented block after function definition on line 1 (1369207306.py, line 2)

## Zero-Indexing

Indexing in Python starts at 0 (e.g the first number in your list is position 0 not 1). 

In [16]:
my_list = [43,56,78,53]

In [17]:
# list indices
print(my_list[0]) # call the first item in the list "my_list"
print(my_list[1]) # call the second item in the list "my_list"

# negative list indices
print(my_list[-1]) # call the last item in the list "my_list"
print(my_list[-2]) # call the second to last item in the list "my_list"

43
56
53
78


Here we are indexing a *list*. You can find out more about list manipulation and different functions using this cheatsheet: https://www.codecademy.com/learn/learn-python-3/modules/learn-python3-lists/cheatsheet

 ## Reading in Data 

You can read in hundreds of different types of data files into Python. Here are some ways to read in a few common ones to hopefully get you started analysing data quickly! 

**Reading in a .csv file with the pandas package** 

Use the pandas function 'read_csv' to import a .csv file and put the file path of your data between the quotation marks. See below using a data set of microplastic records in the Caribbean Sea from https://www.ncei.noaa.gov/products/microplastics. 

In [18]:
data = pd.read_csv(r"data/caribbean_microplastics.csv")
type(data)

pandas.core.frame.DataFrame

*Note: If you put in the raw file path, you might get an error. Make sure to put "r" before the file path as above to convert the file path you copied into a regular string.*

Now you can check the data have been read in correctly by using the .head() function. The default is to print the first 5 lines of the data however you can specify the amount of lines between the brackets. 

In [19]:
data.head() # print the first 5 lines of "data"

Unnamed: 0,FID,Date,Latitude,Longitude,Oceans,Regions,SubRegions,Microplastics Measurement (density),Unit,Density Class Range,...,Sampling Method,Short Reference,Long Reference,DOI,Organization,Keywords,NCEI Accession Number,NCEI Accession Link,x,y
0,9,03/11/1992,14.77,-60.55,Atlantic Ocean,Caribbean Sea,,0.0,pieces/m3,0-0.0005,...,Neuston net,Law et al.2010,"Law, K.L., S. Morét-Ferguson, N.A. Maximenko, ...",https://doi.org/10.1126/science.1192321,Sea Education Association,SEA,211007,https://www.ncei.noaa.gov/access/metadata/land...,-6740395.0,1662708.0
1,10,03/11/1992,14.77,-60.55,Atlantic Ocean,Caribbean Sea,,0.0,pieces/m3,0-0.0005,...,Neuston net,Law et al.2010,"Law, K.L., S. Morét-Ferguson, N.A. Maximenko, ...",https://doi.org/10.1126/science.1192321,Sea Education Association,SEA,211007,https://www.ncei.noaa.gov/access/metadata/land...,-6740395.0,1662708.0
2,11,04/11/1992,14.58,-61.33,Atlantic Ocean,Caribbean Sea,,0.00432,pieces/m3,0.0005-0.005,...,Neuston net,Law et al.2010,"Law, K.L., S. Morét-Ferguson, N.A. Maximenko, ...",https://doi.org/10.1126/science.1192321,Sea Education Association,SEA,211007,https://www.ncei.noaa.gov/access/metadata/land...,-6827224.0,1640844.0
3,12,04/11/1992,14.58,-61.33,Atlantic Ocean,Caribbean Sea,,0.00648,pieces/m3,0.005-1,...,Neuston net,Law et al.2010,"Law, K.L., S. Morét-Ferguson, N.A. Maximenko, ...",https://doi.org/10.1126/science.1192321,Sea Education Association,SEA,211007,https://www.ncei.noaa.gov/access/metadata/land...,-6827224.0,1640844.0
4,13,07/11/1992,14.23,-61.18,Atlantic Ocean,Caribbean Sea,,0.0,pieces/m3,0-0.0005,...,Neuston net,Law et al.2010,"Law, K.L., S. Morét-Ferguson, N.A. Maximenko, ...",https://doi.org/10.1126/science.1192321,Sea Education Association,SEA,211007,https://www.ncei.noaa.gov/access/metadata/land...,-6810526.0,1600617.0


**Reading a netCDF file**

netCDF (.nc) files are popular within the ocean sciences, which can contain multidimensional and geographic data, stored neatly with the metadata. 

To read this filetype into Python we can use the netCDF4 package. Make sure to download it to your activated environment by using the following command in your Anaconda Prompt: 

```
conda install netCDF4
```

We will now read a data set which includes the forecast sea ice extent around Svalbard during December 2022. We'll do this by using the .Dataset function within the netCDF4 package. 

In [20]:
import netCDF4 as nc # import the package at the top of your script before using it 

In [21]:
ds = nc.Dataset(r'data/seaiceextent.nc') # read the .nc file in as a netcdf 'Dataset'

In [22]:
print(ds) # print the metadata 

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
    Conventions: CF-1.6
    institution: NERSC, Jahnebakken 3, N-5007 Bergen, Norway
    source: neXtSIM model fields
    email: nextsimf@nersc.no
    title: neXtSIM-F sea ice forecast, 3 km monthly averaged fields (cmems_mod_arc_phy_anfc_nextsim_P1M-m)
    cmems_product_id: ARCTIC_ANALYSISFORECAST_PHY_ICE_002_011
    FROM_ORIGINAL_FILE__field_type: Files based on file type moorings
    field_date: 2022-12-16
    _CoordSysBuilder: ucar.nc2.dataset.conv.CF1Convention
    references:  
    comment: 
    history: Data extracted from dataset http://localhost:8080/thredds/dodsC/cmems_mod_arc_phy_anfc_nextsim_P1M-m
    dimensions(sizes): x(426), y(483), time(1)
    variables(dimensions): int8 stereographic(), >f8 latitude(y, x), >f8 x(x), >f8 y(y), >f8 sithick(time, y, x), >f8 time(time), >f8 longitude(y, x)
    groups: 


Instead of printing all of the metadata, you can also query the variable names (or 'keys') which will help for exploratory data anlaysis:

In [23]:
ds.variables.keys()

dict_keys(['stereographic', 'latitude', 'x', 'y', 'sithick', 'time', 'longitude'])

Now you know the names of the variables, you can assign them to their own global variables:

In [24]:
seaIceThickness = ds['sithick'] # assign netcdf variable to its own global variable

**Read in various text files**

Text files come in various shapes and sizes. Some may have the file identifier .txt however they don't always have this (e.g .dat). If you can open a file with a text editor (e.g Notepad, Sublime etc) it is most probably a text file and can be read in as below.

In [25]:
f = open("data/nurserygrounds.txt", "r")
print(f.read())

FID,Species,Intensity
Nursery_Grounds_2010.1,Cod,L
Nursery_Grounds_2010.2,Cod,H
Nursery_Grounds_2010.3,Spurdog,L
Nursery_Grounds_2010.4,Spurdog,H
Nursery_Grounds_2010.21,Undulate ray,L
Nursery_Grounds_2010.22,Blue whiting,L
Nursery_Grounds_2010.5,Tope shark,L
Nursery_Grounds_2010.6,Herring,L
Nursery_Grounds_2010.7,Herring,H
Nursery_Grounds_2010.8,European hake,L
Nursery_Grounds_2010.9,Ling,L
Nursery_Grounds_2010.10,Mackerel,H
Nursery_Grounds_2010.11,Mackerel,L
Nursery_Grounds_2010.12,Anglerfish,L
Nursery_Grounds_2010.13,Anglerfish,H
Nursery_Grounds_2010.14,Plaice,L
Nursery_Grounds_2010.15,Sandeel,L
Nursery_Grounds_2010.16,Spotted ray,L
Nursery_Grounds_2010.17,Common skate,L
Nursery_Grounds_2010.18,Sole,L
Nursery_Grounds_2010.19,Sole,H
Nursery_Grounds_2010.20,Thornback ray,L
Nursery_Grounds_2010.23,Blue whiting,H
Nursery_Grounds_2010.24,Whiting,L
Nursery_Grounds_2010.25,Whiting,H



Its good practice to close your text file after you've used it. 

In [26]:
f.close()

This method has other options like reading in only parts of the file or line by line. Find out more here: https://www.w3schools.com/python/python_file_open.asp

If well structured, you can also read .txt files in similarly to .csv files using pandas:

In [27]:
df = pd.read_csv(r"data/nurserygrounds.txt")

In [28]:
df.head()

Unnamed: 0,FID,Species,Intensity
0,Nursery_Grounds_2010.1,Cod,L
1,Nursery_Grounds_2010.2,Cod,H
2,Nursery_Grounds_2010.3,Spurdog,L
3,Nursery_Grounds_2010.4,Spurdog,H
4,Nursery_Grounds_2010.21,Undulate ray,L


**Reading .mat files**

You can even read MATLAB data files (.mat) into Python. To do this you'll need to install and import the ```scipy``` package. Install it using the Anaconda prompt by typing: 

```
conda install scipy 
```

In [29]:
import scipy

In [30]:
mat = scipy.io.loadmat('data/double.mat')

In [31]:
print(mat)

{'__header__': b'MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Mon Mar 25 21:03:23 2019', '__version__': '1.0', '__globals__': [], 'A': array([[0.81472369, 0.15761308, 0.6557407 , 0.70604609, 0.43874436,
        0.27602508, 0.75126706, 0.84071726, 0.35165951, 0.07585429],
       [0.90579194, 0.97059278, 0.03571168, 0.03183285, 0.38155846,
        0.67970268, 0.25509512, 0.25428218, 0.83082863, 0.05395012],
       [0.12698682, 0.95716695, 0.84912931, 0.27692298, 0.76551679,
        0.655098  , 0.50595705, 0.81428483, 0.58526409, 0.53079755],
       [0.91337586, 0.48537565, 0.93399325, 0.04617139, 0.7951999 ,
        0.16261174, 0.69907672, 0.24352497, 0.54972361, 0.77916723],
       [0.63235925, 0.80028047, 0.67873515, 0.09713178, 0.1868726 ,
        0.11899768, 0.89090325, 0.92926362, 0.91719366, 0.93401068],
       [0.0975404 , 0.14188634, 0.75774013, 0.82345783, 0.4897644 ,
        0.49836405, 0.95929143, 0.34998377, 0.28583902, 0.12990621],
       [0.27849822, 0.42176128, 0.743

# Next Seminar: Introduction to Data Analysis using Python

# 6th of February, 2 pm - 3 pm, Sarah Jones Conference Room