# Introduction to Python Data Types

In this tutorial, you will be walked through different data types that are typically used in Python. You will be introduced to the "print" function that can be used to display designated information. You will also be introduced to the "type" function that will help you determine what data type a variable/object is.

__Important notes on variables__:
1) You use variables to store objects or information
2) Python is case-sensitive. "bioreactor" is different from "Bioreactor"
3) You can only name variables with alpha-numeric characters and underscores. But they cannot <ins>start</ins> with a number.
4) There are some keywords in Python, that have fixed functions, you cannot use those either for variable names. https://www.w3schools.com/python/python_ref_keywords.asp

__Important other notes__:
1) Python "indexes" (starts counting) at 0 (not at 1)
2) There are markdown cells, like this one, which are written in markdown, not in Python.
3) If you want a line not to be read (as code), you write a pound symbol (#) in front. This can "comment" out the entire line, or be used mid line.
4) You can comment out a line(s) using Ctrl + / (Windows)

*Last edited: Isabella Casini, Subaru Muroi 16.04.2025*

In [None]:
# import relevant packages (entire line is commented out)

import pandas as pd # call pandas "pd" for short (midline comment)
import numpy as np

# 1. Text types
Strings:
In Python, strings are denoted with single or double quotations, ' or " on either side of the text.

In [None]:
# strings
string_ex = "Bioreactor 1"
string_ex2 = 'Bioreactor 2'

In [None]:
# diplay the output using the print function
print(string_ex)
print("string_ex2", string_ex2)

In [None]:
# Check the type of the data using the type function
type_string_ex = type(string_ex)

print("string_ex data type", type_string_ex)

# print using a different method
print(f"string_ex data type: {type(string_ex)}")

In [None]:
# When working with strings, "\n" and "\t" add a new line and a tab, respectively
# These are useful when formatting and view strings
print("\nNew Line \nNew line and \t a tab")

# 2. Numeric Types

Intergers are whole numbers - no decimal or fraction (e.g., -1, 0, 1), floats  have a decimal (even if it is just for precision) or are a fraction (e.g., -1.0, 0.0, 1.0).

Python has a built-in integer, int. The numpy package also provides two types of integers: int32 and int64 for 32 and 64 bits per integer, respectively.

Python has a built-in float, float. The numpy package also provides two types of floats: float32 and float64 for 32 and 64 bits per integer, respectively.

Some packages are written expecting a certain type of integer or float and will give an error if they receive a nonexpected format.

In [None]:
# directly set a value to an integer
a = 7

# directly set a value to a float 
b = 7.0
b2 = 7.5

print(f"'a'= {a}, data type: {type(a)}")
print(f"'b'= {b}, data type: {type(b)}")
print(f"'b2'= {b2}, data type: {type(b2)}")

# Convert the values (a float) into an integer
a2 = int(b)
a3= np.int32(b)
a4 = np.int64(b)
a5 = np.int64(b2)

print(f"'a2'= {a2}, data type: {type(a2)}")
print(f"'a3'= {a3}, data type: {type(a3)}")
print(f"'a4'= {a4}, data type: {type(a4)}")
print(f"'a5'= {a5}, data type: {type(a5)}")


## Exercise 1: Combine into one variable the variable 'string_ex' and 'b2'

In [None]:
# Example
concat = b2 + string_ex

# What should we do to fix this error?

# 3. Boolean Types

Booleans include "True" and "False" and they can be combined with operators (e.g., "and", "or", "not", "==", "!=") to create expressions.

In [None]:
# You can assess numeric statements
print("Does 'a' equal 'b'? ", a==b)

print("Is 'b' greater than 'b2'? ", b>b2)

In [None]:
# You can assess string statements
# Check if the two strings are the same
print(f"Is this string: '{string_ex}' the same as this string '{string_ex2}'? {string_ex==string_ex2}")

In [None]:
# Check if a partial string in another string
partial_str = "Bioreactor"

print(f"Is our partial string '{partial_str}' in '{string_ex}': {partial_str in string_ex}")

# 4. Sequence Types

Sequence types are used to store multiple items. Their order matters. They includes lists, tuples, and ranges. We'll focus on lists for now. Lists are structured with [ ] and comma to separate entries.

We can:
1) Add and remove items from lists
2) Extract data from lists


In [None]:
# Create or initialize a list
list1 = [] # this is an empty list

print(f"'list1' ({list1}) data type: {type(list1)}")


In [None]:
# Create a list with values

list2 = ['a', 'b', 'c'] # this is a list of strings
list3 = [1,2,3] # this is a list of integers

print(f"'list2' ({list2}) data type: {type(list2)}")
print(f"'list3' ({list3}) data type: {type(list3)}")

In [None]:
# Add to lists and remove from lists
list2.append('d') # adds a value to the end of the list
list3.insert(3,4) # adds a value at the index position (position, value)
list3.insert(0,0) # adds a value at the index position (position, value)

print(f"'list2': {list2}")
print(f"'list3': {list3}")

In [None]:
# Remove specified values from lists
list2.remove('b') # if that value is not present you will get an error

print(f"'list2': {list2}")

In [None]:
# Extract values of interest based of index
print(f"The 3rd item (index 2) in 'list2': {list2[2]}")

# Check the length of a list (how many entries it has) with the 'len()' function
print(f"'list2' has {len(list2)} entries")

In [None]:
# Add a list to a list
list2.append(list3)

print(f"'list2': {list2}")

In [None]:
# You should see that now you have a list within a list.
# If you index to the list (index = 3) then you extract the full list
# If you want to pull something out of that list, you need to index further in

print(f"The former list3, or the 4th item (index 3) in 'list2': {list2[3]}")
print(f"From the former list3, take the 2nd item (index 1): {list2[3][1]}")


# 5. Mapping Type - Dictionaries

Dicitionaries are also collections, they contain keys and values. You can look for values by using the key. The value can be any data type. They are structured with { } with a : to separate a key from its value, and commas to separate key:value pairs.

In [None]:
# Initalize an empty dictionary
dict0 = {} # you can add items to this dictionary as show below

# Create a dictionary with 'color1' and 'color2' as keys, and 'red' and 'orange' as the respective values
dict1 = {'color1':'red','color2':'orange'}

# Add to a dictionary
dict1['color3'] = 'yellow'

# Find a value of interest
print(f"The value to the key 'color2' is: {dict1['color2']}")

In [None]:
# Extract all of the keys in a dictionary
keys = dict1.keys()
print(f"Keys in dict1: {keys}")

In [None]:
# Extract all of the values in a dictionary
values = dict1.values()
print(f"Values in dict1: {values}")

## Exercise 2: What is wrong with the following code?

In [None]:
# Find a value of interest from a dictionary
print(f"The value to the key 'color4' is: {dict1['color4']}")

# 6. Arrays with Numpy

We will use Numpy for arrays. Arrays can be higher dimensional, but we'll start with 2-dimensions and create the arrays using lists.

Each level of brackets [] is another dimensions.

In [None]:
# Create a list of lists. The number of lists is the number of rows, and the enteries in each list are the number of columns

# Make a 2 x 3 array (2 rows, 3 columns)
list_array = [[1,2,3],[4,5,6]]

# Convert that list into an array.
array1 = np.array(list_array)
print(array1)
print(f"\narray1: \n{array1}")

In [None]:
# The 'shape' function will return the dimensions of the array.
# In our 2-D case, it will return n x m number of rows[0] and columns [1], respectively
print(f"\narray1 has {array1.shape[0]} rows and {array1.shape[1]} columns")

In [None]:
# Perform typical matrix operations on a array

# Create a 2 x 3 array of ones, using the np.ones() function
ones_array = np.ones((2,3))
print(ones_array)

# Add the arrays together
sum_array = array1 + ones_array

print(f"\nsum_array: \n{sum_array}")

In [None]:
# Extract data the same way as in lists, indexing in using integers

print(f"\nThe first row: \n{sum_array[0]}")
print(f"\nThe second row and the first column: \n{sum_array[1][0]}")

# 7. Dataframes (Tables) with Pandas

We will use Pandas to create, manipulate, and read in/write out tables, or dataframes as they are called in Pandas. Dataframes, unlike arrays, are always 2-dimensional.

Dataframes can be created manually, built using lists or dictionarys, or read in from tables (like csv, tsv, excel). There are many ways to build them, below, you will find just a few examples.

Dataframes (unlike arrays) have column headers and an index (row names)


### Dataframe creation

In [None]:
# Manually creating a dataframe - generally not recommended/used

df1 = pd.DataFrame(columns=["C1", "C2", "C3"], data=[[1,2,3],[4,5,6]])
print(df1)

In [None]:
# Creating a dataframe from lists (one list per column)

c1 = [1,4]
c2 = [2,5]
c3 = [3,6]

col_names = ["C1","C2","C3"]

df2 = pd.DataFrame(list(zip(c1,c2,c3)), columns=col_names)
print(df2)

In [None]:
# Creating a Dataframe from a dictionary
# The keys are the column names and the number of entries in your values in the number of rows
dict_df = {"C1":[1,4],"C2":[2,5],"C3":[3,6]}

df3 = pd.DataFrame(dict_df)
print(df3)

### Accessing Data Within a Dataframe and Manipulating It

In [None]:
# Pull out columns
print(f"\nThe first column: \n{df3['C1']}")

# Pull out columns and rows/index, [column name][index number]
print(f"\nThe first column: \n{df3['C1'][1]}")

In [None]:
# Rename the columns and index

# Copy our dataframe (optional)
df4 = df3.copy()
# print(df4)

# Renaming the columns with a dictionary: old name is the key, and the new name is the value
df4 = df4.rename(columns= {"C1":"c1","C2":"c2","C3":"c3"})

# Rename the index
df4 = df4.rename(index={0:"row_0",1:"row_1"})

print(df4)

In [None]:
# loc function - can take labeled rows/columns in addtion to the indices
# [row][column] or [row, column] with labels
print(f"\nThe first row df3: \n{df3.loc[0]}")
print(f"\nThe first row df4: \n{df4.loc['row_0']}")

In [None]:
print(f"\nThe first row and last column df3: \n{df3.loc[0][2]}") # this is generating a warning
print(f"\nThe first row and last column df4: \n{df4.loc['row_0']['c3']}")
print(f"\nThe first row and last column df4: \n{df4.loc['row_0','c3']}")

In [None]:
# iloc function -  only takes numerical indices
# [row,column]
print(f"\nThe last row df3: \n{df3.iloc[1]}")

# Choose a range using the ':'
# Choose the last number of values by putting a '-'
print(f"\nThe last row and last 2 columns df3: \n{df3.iloc[1,-2:]}")

### Mathematical operations and data replacement

In [None]:
df4 = df4*-1
print(df4)

In [None]:
# Individually replace values in the dataframe (2 different methods)

# Make row_1 and c2 value 'Nan' instead of -5 in df4
df4.loc['row_1','c2'] = np.nan
print("\n",df4)

# Use the replace function to make row_0 and c1 value 'Nan' instead of -1
df4 = df4.replace([-1],[np.nan])
print("\n",df4)

In [None]:
# Filtering rows and columns with a certain value
# Drop the columns with a certain (NaN) value
df4 = df4.dropna(axis='columns')
print("\n",df4)

# 8. Reading and Writing out Files with Pandas


## Reading in csv/txt files

You can specify what the delimiter (or separator) is for your file, .csv typically have commas, while .txt or .tsv files typically have tabs.

You need to specify the location of the file (the path) that you want to read in.

In [None]:
# Read in a csv file b
pathcsv = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\fake_gdcw_growth_data.csv"
df5 = pd.read_csv(pathcsv,sep=',',header=1,skiprows=0)
print(df5)

In [None]:
# Read in txt file
pathtxt = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\fake_gdcw_growth_data_diff.txt"
df6 = pd.read_csv(pathtxt,sep='\t', header=0)
print(df6)

## Writing out a csv file from a dataframe

Note: You can use the same function for .txt files, just change the file extension and the delimiter

In [None]:
# Designate the path out for the file (to where it will be written)
pathout = r"C:\Users\uqicasin\Downloads\df3_out.csv"
df3.to_csv(pathout,sep=',')

## Reading and Writing Excel files

In [None]:
# Reading in an excel file
pathxls = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\fake_gdcw_growth_data_pivoted.xlsx"
df7 = pd.read_excel(pathxls, sheet_name='Growth_Data', header=0, index_col=0) # can select sheet name
print(df7)

## Exercise 3: Edit your dataframe and write it out to an Excel Sheet

hint: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

1) Drop the the Strain_B columns
2) Make every zeroth time point have a value of 0. 