## Running code cells

Once you get into a Colab notebook you can start writing in code cells. 
What you are reading now is in a markdown cell which allows you to add information to your code.

In [16]:
2 + 2

4

In [17]:
# This is a comment
# Code cells can contain multiple lines of code 
2 * 2

4

## Variables and Data Types

One of the most basic things we can do in Python is assign values to **variables**:

In [18]:
text = 'Botany 2022'  # An example of a string
number = 42  # An example of an integer
pi_value = 3.1415  # An example of a float

Here we’ve assigned data to the variables *text*, *number* and *pi_value*, using the assignment operator =. 
To review the value of a variable, we can type the name of the variable into the interpreter and press *Shift* *Return*:

In [37]:
text

'Botany 2022'

Python only displays the object that appears in the last line of the code.

In [38]:
text 
number

42

You have to tell python if you want to display multiple objects from your code.

In [29]:
print(text)
print(number)

Botany 2022
42


Everything in Python has a type. To get the type of something, we can pass it to the built-in **function** type:

In [20]:
type(text)

str

In [21]:
type(number)

int

In [22]:
type(pi_value)

float

Notice that a string (str) is what python calls text. A string always needs to be enclosed in double or single quotation marks. 

In [39]:
error = Botany 2022

SyntaxError: ignored

We can also use comparison and logic operators: <, >, ==, !=, <=, >=. The data type returned by this is called a **boolean**.
Notice that we use == as an operator and = to assign a variable.


In [23]:
3 > 4

False

In [25]:
result = pi_value > number
result

False

In [26]:
type(result)

bool

### Collections: Lists and Dictionaries

A **list** is a common data structure to hold an ordered sequence of elements. Each element can be accessed by an index. Note that Python indexes start with 0 instead of 1:

In [None]:
numbers = [1, 2, 3]
numbers[0]

1

To add elements to the end of a list, we can use the append **method**. Methods are a way to interact with an object (a list, for example). We can invoke a method using the dot . followed by the method name and a list of arguments in parentheses. Let’s look at an example using append:

In [None]:
numbers.append(4)
print(numbers)

[1, 2, 3, 4]


A **dictionary** is a container that holds pairs of objects - **keys** and **values.**

In [None]:
prunus_common = {'cerasus': 'sour cherry', 'armeniaca': 'apricot'}
prunus_common['armeniaca']

'apricot'

Dictionaries work a lot like lists. Where lists were given indexes starting at 0 by Python, in dictionaries you index them with keys. You can think about a key as a name or unique identifier for the value it corresponds to.

To add an item to the dictionary we assign a value to a new key:

In [None]:
prunus_common['dulcis'] = 'almond'
print(prunus_common)

{'cerasus': 'sour cherry', 'armeniaca': 'apricot', 'dulcis': 'almond'}


## Python packages

A Python package (or library) is a collection of custom functions and data types for use by other programs.

You can import a Python package with the *import* keyword.

A very commonly used Python package for working with data is called *pandas*. Let's import pandas to create a custom dataframe.

In [3]:
import pandas as pd

Let's make a list of dictionaries to feed into pandas.

In [30]:
records = [{'voucher_number':123, 'collector':'Sundre', 'scientific_name':'Prunus cerasus'},
           {'voucher_number':124, 'collector':'Richie', 'scientific_name':'Prunus armeniaca'}]

We can then use the dataframe method to turn this dictionary into a dataframe.

In [31]:
df = pd.DataFrame(records)

We can use the type function to find out what the df variable is.

In [32]:
type(df)

pandas.core.frame.DataFrame

Python will return the first 5 and last 5 rows of a dataframe. Here we only have 2.

In [33]:
df

Unnamed: 0,voucher_number,collector,scientific_name
0,123,Sundre,Prunus cerasus
1,124,Richie,Prunus armeniaca


We can also use the info function to find out more about our dataframe. 

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   voucher_number   2 non-null      int64 
 1   collector        2 non-null      object
 2   scientific_name  2 non-null      object
dtypes: int64(1), object(2)
memory usage: 176.0+ bytes


Once you create a dataframe you may want to save it as a csv. You can do this using the .to_csv method.

In [10]:
df.to_csv('specimen_records.csv')

You can also reimport a dataframe using the .read_csv method. If you already have a dataframe you're working on, in excel for example, this is a good way to move it into pandas. Notice that the dataframe was saved with Python's row labels.


In [12]:
df1 = pd.read_csv('specimen_records.csv')
df1

Unnamed: 0.1,Unnamed: 0,voucher_number,collector,scientific_name
0,0,123,Sundre,Prunus cerasus
1,1,124,Richie,Prunus armeniaca


To save without the row labels, set index to false.

In [35]:
df.to_csv('specimen_records.csv', index = False)

In [36]:
df2 = pd.read_csv('specimen_records.csv')
df2

Unnamed: 0,voucher_number,collector,scientific_name
0,123,Sundre,Prunus cerasus
1,124,Richie,Prunus armeniaca
