<img src="https://teaching.bowyer.ai/sdsai/resources/0/img/IMPERIAL_logo_RGB_Blue_2024.svg" alt="Imperial Logo" width="500"/><br /><br />

Programming for Data Science and AI
==============
### SURG70098 - Surgical Data Science and AI
### Stuart Bowyer

## Technical Module Structure and Materials

### Structure
* Aims to go from zero to practice and confidence in using programming to explore clinical data
* This course is **extremely** ambitious (dense) in technical content
* To be successful, you will need to make use of your **self-study time** (150 hours)
* You should be:
    * revisiting lecture material
    * completing tutorial exercises
    * reading list

### Materials
* Lecture notes are available as pdf on the blackboard page
* Notes all contain a link to the source code in Google Colab where you can run them yourself
* Tutorial solutions will be released on Monday on blackboard


## Intended Learning Outcomes
1.  Understand the importance of programming in data science and AI, and the choice of programming language
1.  Be able to setup a Python notebook-based programming project
1.  Perform basic programming tasks in Python 
1.  Write and structure Python code using good-practice


## Session Outline
1.  [Programming Fundamentals](#programming-fundamentals)
1.  [Python Development Environments](#python-development-environment)
1.  [Python Basic Concepts](#basic-concepts)
1.  [Wrap Up and Generative AI Usage](#wrap-up)

# Programming Fundamentals
An introduction to what programming is, and how we are going to use it

## Digital Computation
<table style='table-layout: fixed; width: 100%; margin-top: 0;'>
  <tbody>
    <tr>
      <td>We want ...</td>
      <td>We have ...</td>
      <td>Therefore, we ...</td>
    </tr>
    <tr>
      <td>
        <img src="https://teaching.bowyer.ai/sdsai/resources/1/img/Surgeon_and_Data_-_ImageFx.jpeg">
      </td>
      <td>
        <img src="https://live.staticflickr.com/3424/3201183945_9aca75b8ed_b.jpg">
        <p class="attribution"><small>"<a rel="noopener noreferrer" href="https://www.flickr.com/photos/83542829@N00/3201183945">Intel Core 2 Duo E7300 CPU</a>" by <a rel="noopener noreferrer" href="https://www.flickr.com/photos/83542829@N00">William Hook</a> is licensed under <a rel="noopener noreferrer" href="https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse">CC BY-SA 2.0 <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /><img src="https://mirrors.creativecommons.org/presskit/icons/sa.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /></a>.</small></p>
      </td>
      <td>
        <img src="https://live.staticflickr.com/4836/44661017890_eb4f288ffe_b.jpg">
        <p class="attribution"><small>"<a rel="noopener noreferrer" href="https://www.flickr.com/photos/47121680@N00/44661017890">Python Source Code</a>" by <a rel="noopener noreferrer" href="https://www.flickr.com/photos/47121680@N00">joncutrer</a> is marked with <a rel="noopener noreferrer" href="https://creativecommons.org/publicdomain/zero/1.0/?ref=openverse">CC0 1.0 <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /><img src="https://mirrors.creativecommons.org/presskit/icons/zero.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /></a>.</small></p>
       </td>
    </tr>
    <tr>
      <td>
        To create a data science or AI application that can:
        <ul>
          <li>Perceive some input/s</li>
          <li>Possibly combine it with some other prior information</li>
          <li>Output some decision or prediction</li>
          <li>Without us having to help it run manually</li>
        </ul>
      </td>
      <td>
        Computers based on processors that can do:
        <ul>
          <li>Arithmetic (Add, subtract, multiply, divide)</li>
          <li>Logic (AND, OR, NOT, compare)</li>
          <li>Read/write</li>
        </ul>
      </td>
      <td>
        Describe our requirements/intent as a sequence of instructions that can be applied by the computer.
      </td>
    </tr>
  </tbody>
</table>

## Python 3 (Programming Language)
*   Simple high-level syntax - relatively easy for beginners to write and intuitive to understand
*   Very widely used (particularly in machine learning) - lots of online resources, community and libraries
*   Interpreted (code translation happens at runtime) - allows quick prototyping and debugging
*   Cross platform and open source (will happily run on windows, mac, web, etc.) - anyone, anywhere can use it for free

In [2]:
numbers = [10, 5, 20, 8, 15]

if max(numbers) > 15:
    print("The maximum number is greater than 15")

The maximum number is greater than 15


* See the lecture note appendices for information on 'no-code' and other programming languages

# Python Development Environment

## Using Google Colab
### Getting Started
*   I suggest using Google Chrome browser
*   https://colab.research.google.com
*   Sign in with your google account
*   Create a new notebook by clicking the '+ New notebook'
*   Rename the new notebook to something useful (suggest 'SDSAI Tutorial 1')

### Running Code
![Running code in Colab](https://teaching.bowyer.ai/sdsai/resources/1/img/Colab_Running_Code.png)

### Adding Code Blocks
![Adding code blocks in Colab](https://teaching.bowyer.ai/sdsai/resources/1/img/Colab_Add_Blocks.png)

### Adding Text Blocks
![Adding text blocks in Colab](https://teaching.bowyer.ai/sdsai/resources/1/img/Colab_Add_Markdown.png)

### Managing Runtime
![Managing runtime in Colab](https://teaching.bowyer.ai/sdsai/resources/1/img/Colab_Manage_Runtime.png)

### Considerations and Limitations
*   **As notebooks retain data you should NEVER use them with data for which you do not have permission to put in an overseas online repository**
*   Notebooks teach/encourage a specific way of programming that conflicts with some advanced programming, packaging, and deployment of code
*   Free Colab notebooks have limited resources (approx. 2.2GHz CPU, 12 GB RAM, and 110 GB disk)
*   Colab notebook runtimes have a limited lifetime (2-3 days at maximum, 30 mins of inactivity)
*   Risks of accruing costs quickly if care is not taken

# Basic Concepts
This section of the tutorial covers variables, data types, operators and control flow.

## Comments
*   Comments are text notes mixed in with the Python code that are not interpreted or run by Python.
*   You should use comments to annotate code to help other people, or yourself later one, understand its function.
*   I will use these to help demonstrate the purpose of different bits of code.

In [3]:
# This is a comment, and running this doesn't do anything

# Comments can be split
# across multiple lines

""" Comments can also be wrapped in triple quote marks """

"""
Triple quote marks can even be used
across multiple lines

with spaces.
"""

'\nTriple quote marks can even be used\nacross multiple lines\n\nwith spaces.\n'

## Variables
*   Variables store data in a Python program. They can be created, modified and used.
*   Variable names can be very flexible; however, there are recommendations names should be "lowercase, with words separated by underscores as necessary to improve readability" (https://peps.python.org/pep-0008/#function-and-variable-names)

In [1]:
# Create two variables 'a' and 'b' with the values 10 and 20
a = 10
b = 20

# Print the value of the variables
print(a)
print(b)

10
20


*   Once created, variables can be overwritten with new data
*   Note that variable names are case sensitive (i.e. `A` is not the same as `a`)

In [5]:
# Variables can be overwritten, even with totally different data
a = "Tree"

print(a)

# Variables are case sensitive
A = "Flower"

print(a)
print(A)

Tree
Tree
Flower


## Types
*   Variables in Python all have 'types'. These define the type of data they can hold.
*   Python is dynamically typed, which means a variable's type can change, but it will only do this when your code explicitly asks for it.
*   These are some of the inbuilt types, but there are many many more inbuilt and through libraries.
*   **Types are important because they define what 'kind of thing' you can do with a variable**

### Numerical Types

In [6]:
number = 42       # a number, specifically an 'integer'
decimal = 3.14159 # a 'float', i.e. a floating point number with a decimal value

print(type(number))
print(type(decimal))

<class 'int'>
<class 'float'>


### Logical Types

In [None]:
boolean = True  # a true logical bool (boolean) - note capitalisation
boolean = False # a false logical bool (boolean) - note capitalisation

print(type(boolean))

<class 'bool'>


### Sequence Types
*   Sequences are ordered sets of other data types
*   Other types exist but are beyond the scope here

In [7]:
# Sequence types
lst = [1, 1, 2, 3, 5, 8]  # a list of any other type
tup = (2, 3, 5, 7, 11)    # a tuple (immutable) list of any other type
string = "elephant"       # a str (string), i.e. a sequence of characters
lstlst = [1, [10, 20], 2] # you can even put lists in lists (in lists ...)
lstmix = [1, "lion", 2]   # you can even mix types within lists

print(type(lst))
print(type(tup))
print(type(string))
print(type(lstlst))
print(type(lstmix))

<class 'list'>
<class 'tuple'>
<class 'str'>
<class 'list'>
<class 'list'>


### Mapping Types
*   Mappings are a collection (non-ordered) of 'key' 'value' pairs
*   They are very helpful for retrieving information

In [9]:
dictionary = {            # a dict (dictionary) mapping of keys to values
    "firstname": "John",
    "lastname": "Doe",
    "age": 23
}
print(type(dictionary))
print(dictionary)
print(dictionary["firstname"])
print(dictionary["age"])

<class 'dict'>
{'firstname': 'John', 'lastname': 'Doe', 'age': 23}
John
23


### Lists
*   Elements (items) within a list can be accessed in several ways
*   These accessor functions also work for other sequences

#### List Item Access

In [10]:
# Create a list of patients
patients = ['Oliver', 'Amelia', 'Noah', 'Emma', 'Liam']

# List items are accessed with square brackets (Note indices start at 0)
print('The first patient is', patients[0])
print('The second patient is', patients[1])

# Sections of a list can be selected (sliced)
print('The second and third patients are', patients[1:3])

# Lists can be indexed in reverse - allowing easy access to the last elements
print('The last patient is', patients[-1])
print('The penultimate patient is', patients[-2])

The first patient is Oliver
The second patient is Amelia
The second and third patients are ['Amelia', 'Noah']
The last patient is Liam
The penultimate patient is Emma


#### List Built-in Functions
*   Functions reusable bits of code that can take an input and/or return something (e.g. `print()`)
*   We will cover functions in more detail later on
*   There are several functions that you can use on lists to get information about them

In [11]:
# You can find out the length of a list with the 'len' function
print('Number of patients is', len(patients))

# You can sort (order alphabetically) the list
print('Sorted list is', sorted(patients))

Number of patients is 5
Sorted list is ['Amelia', 'Emma', 'Liam', 'Noah', 'Oliver']


#### List Methods
*   Some functions are called 'methods' of the list object and can be found here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
*   These are functions that are attached to the list object so are called in a different way
*   Note that these methods modify the original list itself

In [12]:
# You can sort a list (alphabetically) with the 'sort' method
patients.sort()
print('Sorted list', patients)

# You can add items
patients.append('Zoe')
print('Appended list', patients)

# And remove items - based on their index
patients.pop(0)
print('Popped list', patients)

# You can reverse the list
patients.reverse()
print('Reversed list', patients)

Sorted list ['Amelia', 'Emma', 'Liam', 'Noah', 'Oliver']
Appended list ['Amelia', 'Emma', 'Liam', 'Noah', 'Oliver', 'Zoe']
Popped list ['Emma', 'Liam', 'Noah', 'Oliver', 'Zoe']
Reversed list ['Zoe', 'Oliver', 'Noah', 'Liam', 'Emma']


### Strings
*   Manipulating text strings is very common in programming and specifically in data science

#### String Character Access
*   The characters within a string can be accessed in the same way as lists

In [13]:
diagnosis = "N18.2 - Chronic kidney disease, stage 2"

# First character
print(diagnosis[0])
# Last character
print(diagnosis[-1])
# Set of characters
print(diagnosis[8:30])

# As with other sequences, you can also use the 'len' function to find the length
print('Length of string', len(diagnosis))

N
2
Chronic kidney disease
Length of string 39


#### String Methods

*   There are many methods for strings that make manipulating them easier - https://docs.python.org/3/library/stdtypes.html#str
*   Note the difference between the previous list methods and these string methods.
    *   Strings are immutable (unchangable) therefore the methods return a modified copy of the original string.
    *   Lists are mutable (changeable) therefore the methods modify the original variable. These are called 'in-place' methods.
    *   Always check the documentation

In [14]:
# Change the case
print('Upper case:', diagnosis.upper())
print('Lower case:', diagnosis.lower())

# Split the string at a given point - creating a list of strings
diagnosis_split = diagnosis.split(' - ')
print('Diagnosis code:', diagnosis_split[0])
print('Diagnosis description:', diagnosis_split[1])

# Replace parts of the string
print('Diagnosis replaced:', diagnosis.replace("Chronic kidney disease", "CKD"))

# f-strings (formatted string literals) make printing strings with variables easier
# There are several other ways to print formatted strings
print(f'The diagnosis is: {diagnosis}')

Upper case: N18.2 - CHRONIC KIDNEY DISEASE, STAGE 2
Lower case: n18.2 - chronic kidney disease, stage 2
Diagnosis code: N18.2
Diagnosis description: Chronic kidney disease, stage 2
Diagnosis replaced: N18.2 - CKD, stage 2
The diagnosis is: N18.2 - Chronic kidney disease, stage 2


### ❓ Types Quick Check
1. What would be the type of this variable:  
    ```which_type_1 = '201'```
1. What would be the type of this variable:  
    ```which_type_2 = [1]```
1. What would be the type of this variable:  
    ```which_type_3 = 4 / 3```

## Operators

### Arithmetic Operators
Arithmetic manipulation of data is fundamental to developing software applications.

In [15]:
a = 11
b = 3

print("a + b =", a + b)   # addition
print("a - b =", a - b)   # subtraction

print("a * b =", a * b)   # multiplication
print("a / b =", a / b)   # division
print("a // b =", a // b) # integer division
print("a % b =", a % b)   # modulo (remainder)

print("a ** b =", a ** b) # Exponent

a + b = 14
a - b = 8
a * b = 33
a / b = 3.6666666666666665
a // b = 3
a % b = 2
a ** b = 1331


#### Operator Precedence
*   ❗Make sure you consider the order of precedence of arithmetic operators, as you would for conventional maths
*   Precdence will apply to all operators in Python
*   The full order is available here: https://docs.python.org/3/reference/expressions.html#operator-precedence

In [16]:
# These two lines are not the same because multiplication has a higher precedence than addition
print(1 + 2 * 3)
print((1 + 2) * 3)

7
9


### Comparison Operators
*   Comparing data values are a fundamental element of computation
*   Standard mathematical comparisons are available



In [18]:
age = 5

print("age == 4:", age == 4)   # equality (equal to)
print("age != 4:", age != 4)   # inequality (not equal to)
print("age > 4:", age > 4)     # greater than
print("age >= 4:", age >= 4)   # greater than or equal to
print("age < 4:", age < 4)     # less than
print("age <= 4:", age <= 4)   # less than or equal to

age == 4: False
age != 4: True
age > 4: True
age >= 4: True
age < 4: False
age <= 4: False


#### Comparison Outputs
*   The output of a comparison operator is a simple boolean (True/False)
*   Outputs can be combined in other functions or writen to variables
*   ❗note the difference between double `==` and single `=` equals symbols.
    *   Double is a comparison operator, and single is the assignment operator.
    *   A single equals `=` assigns a value, like putting a label on a box.
    *   A double equals `==` asks a question, 'Are these two things the same?', and gives a True or False answer.


In [19]:
result = a == 4
print(result)

False


### Logical Operators
*   Logical operators allow you to combine boolean results to form more complex expressions.
*   As with arithmetic operators, logical operators have an order of precedence https://docs.python.org/3/reference/expressions.html#operator-precedence

In [20]:
asthma = False
female = True

# 'and' operator returns True if and only if inputs are both True
print("asthma and female =", asthma and female)

# 'or' operator returns True if either inputs are True
print("asthma or female =", asthma or female)

# 'not' operator returns the negation of the input
# (i.e. True => False / False => True)
print("not asthma =", not asthma)
print("female and not asthma =", female and not asthma)

asthma and female = False
asthma or female = True
not asthma = True
female and not asthma = True


### ❓ Operators Quick Check
1. What would be result of:  
    ```2 ** 3```
1. What would be result of:  
    ```9 - 4 // 3```
1. What would be result of:  
    ```not (True and False)```
1. What would be result of:  
    ```not ('tree' != 'cat') or (6 < 7) or (4724 / 63 > 75)```

## Type Conversion (Casting)
*   There are often times when dealing with real data when you need to convert data between types
*   Some conversions are obvious, others might be counterintuitive
*   To understand each, see the detail in the documentation: https://docs.python.org/3/library/functions.html
*   Here we demonstrate casting by building a BMI calculator
    *   `input()` is a built-in function that allows you to receive an input from the user

In [21]:
# The input function, reads an input value from the user
weight = input('Input weight (kg)... ')
height = input('Input height (m)... ')
print("Weight is", weight)
print("Height is", height)

# Apply an arithmetic operator compute BMI (THESE DO NOT RUN)
# bmi = weight / height ** 2
print(type(weight))
print(type(height))

# Convert the string to an integer or float
weight = int(weight)
height = float(height)
bmi = weight / height ** 2
print('bmi =', bmi)

Weight is 80
Height is 2
<class 'str'>
<class 'str'>
bmi = 20.0


#### Boolean Conversion
*   All types can be converted to `bool` and this is often used in Python to simplify comparisons
*   It is not recommended to do this unless necessary as it can harm readability

In [22]:
print("bool(0) =", bool(0))         # 0 becomes False
print("bool(1) =", bool(1))         # anything non-zero becomes True
print("bool(3452) =", bool(3452))
print("bool('') =", bool(''))       # empty string becomes False
print("bool('bus') =", bool('bus')) # any non-empty string becomes True

bool(0) = False
bool(1) = True
bool(3452) = True
bool('') = False
bool('bus') = True


## Control Flow
This section will cover control flow statements such as if statements and loops

### If...Else
The `If...Else` instruction allows your code to perform different tasks based on some supplied condition.

These instructions are of the format:
> If `condition` then do `action`

But can also include a default action:
> If `condition` then do `action`  
> else do `default action`

They can even include multiple conditions:
> If `condition 1` then do `action 1`  
>   else if `condition 2` then do `action 2`  
>   else do `default action`

#### If...Else
*   Here we create a program that analyses an input resting heart rate using if...else
*   **WARNING: Indentation of the `action` code is very important**

In [None]:
heart_rate_string = input('Input resting heart rate (BPM)... ')
heart_rate = float(heart_rate_string)

# Example if...then
if heart_rate > 100:
  print('Heart rate of ' + str(heart_rate) + ' BPM indicates tachycardia')

# Example if...then...else
if heart_rate > 100:
  print('Heart rate of ' + str(heart_rate) + ' BPM indicates tachycardia')
else:
  print('Heart rate of ' + str(heart_rate) + ' BPM is normal \U0001f600')

Heart rate of 150.0 BPM indicates tachycardia
Heart rate of 150.0 BPM indicates tachycardia


#### If...Elif...Else
*   Here we extend the example to consider low heart rate conditions
*   We also include a more complicated condition 

In [None]:
heart_rate_string = input('Input resting heart rate (BPM)... ')
heart_rate = float(heart_rate_string)

# Example if...elseif...else
if heart_rate > 100:
  print('Heart rate of ' + str(heart_rate) + ' BPM indicates TACHYCARDIA')
elif heart_rate < 60:
  print('Heart rate of ' + str(heart_rate) + ' BPM indicates BRADYCARDIA ')
else:
  print('Heart rate of ' + str(heart_rate) + ' BPM is normal \U0001f600')

# The conditions can be anything that evaluates to a boolean
age_string = input('Input age (years)... ')
age = float(age_string)

if heart_rate > 100 and age >= 18:
  print('Heart rate of ' + str(heart_rate) + ' BPM indicates adult tachycardia')
  print(' - Age of ' + str(age) + ' years')
  print(' - Indicates adult tachycardia')

Heart rate of 150.0 BPM indicates TACHYCARDIA
Heart rate of 150.0 BPM indicates adult tachycardia
 - Age of 34.0 years
 - Indicates adult tachycardia


### For... Loops
*   For loops allow you to iterate over a sequence of inputs and apply some operations for/to each of them
*   They have the structure:  

> For each `object` in `sequence` do `action`

*   This is useful if you want to repeat the same action many times
*   For example, if we want to print all of the names in a list

#### For... loop Example
*   If we want to print a list of patient names we can do the following...

```
print(patients[0])
print(patients[1])
print(patients[2])
etc.
```

*   However, this is inefficient for long lists
*   And making changes (e.g. change the print to `print('Name:', patients[0])`) is difficult
*   A for...loop implementation is as follows:

In [25]:
patients = ['Oliver', 'Amelia', 'Noah', 'Emma', 'Liam']

# For each 'name' in the list 'patients'
for name in patients:
  print(name)

Oliver
Amelia
Noah
Emma
Liam


#### Range Function
*   The `range()` function is often helpful when writing for...loops
*   This generates a sequence of numbers between limits and can be used to create loop indices
*   e.g. combining it with the `len()` function we can generate indices for a list and use it to index across two lists simultaneously

In [None]:
# A new corresponding list of medical record numbers for each patient
mrn = [101, 214, 394, 395, 619]

# Compute the number of patients
n_patients = len(patients)

# For each element in the lists
for index in range(n_patients):
  
  # Print the MRN and corresponding patient name
  print(mrn[index], patients[index])

101 Oliver
214 Amelia
394 Noah
395 Emma
619 Liam


### While... Loops
*   While loops allow you to repeatedly apply some code.
*   They have the structure:

> While `condition` do `action`

*   Again, note the importance of indentation.
*   e.g. Here, we print the MRN and associated patient name (again) using a while...loop

In [29]:
# Initialise the index to zero
index = 0

# Keep looping until the index is out of the list
while index < len(mrn):
  print(mrn[index], patients[index])

  # Increment the index for the next loop
  index += 1


101 Oliver
214 Amelia
394 Noah
395 Emma
619 Liam


#### While Loop Conditions
*   While loop conditions can contain any boolean expression
*   e.g. Stop printing the MRN/patient when their name has fewer than 6 characters

In [30]:
index = 0
while index < len(mrn) and len(patients[index]) >= 6:
  print(mrn[index], patients[index])
  index += 1


101 Oliver
214 Amelia


## ❓ Exercise 1.1
[Colab notebook](https://colab.research.google.com/github/stuartbowyer/sdsai-lecture-notes/blob/main/Examples_Exercises/Exercise01.ipynb#scrollTo=exercise_1_1)

## ❓ Exercise 1.2
[Colab notebook](https://colab.research.google.com/github/stuartbowyer/sdsai-lecture-notes/blob/main/Examples_Exercises/Exercise01.ipynb#scrollTo=exercise_1_2)

# Wrap Up

## Help Writing Code
*   Practical Tips
    *   Plan your code
    *   Use comments and docstrings to outline what something does
    *   Keep considering whether what you are doing should be in a preexisting library
    *   Take a break from issues 
    *   PEP 20 The Zen of Python - https://peps.python.org/pep-0020/

*   Documentation and Reference
    *   Python 3 Standard Library - https://docs.python.org/3/library/index.html 
    *   W3 Schools - https://www.w3schools.com/python/python_reference.asp
    *   PEP 8 Style Guide - https://peps.python.org/pep-0008/ 

*   Further Tutorials (Free)
    *   These will help if you want some more info on a specific area or just want more practice
    *   Interactive exercises - https://www.datacamp.com/courses/intro-to-python-for-data-science
    *   Exhaustive reference tutorial - https://www.w3schools.com/python/python_intro.asp 


## More Help Writing Code
*   Forums and Search
    *   Try to consider the source and reputation of information you find online
    *   **Never** just run code without reviewing and understanding it - risks are significant
    *   Stack Overflow - https://stackoverflow.com/questions (search extensively before asking)
    *   Google search
*   Tutorials and Office Hours
    *   Bring questions to future tutorials
    *   Email me: stuart.bowyer@imperial.ac.uk
    *   We can setup office hours for later tutorials


## Generative AI Code Tools
You can use these tools - it is not 'cheating'. However, you **must** be careful.

<img src="https://teaching.bowyer.ai/sdsai/resources/1/img/LLM_Coding.png" alt="LLM Coding" width="50%">

### Warnings

* **Confidentiality is Absolute.** NEVER input PHI or sensitive data (or proprietary methods). Assume all inputs are made public.
* **AI Hallucinates.** It will confidently invent functions, libraries, and logic that don't exist - and will argue forcefully. It prioritizes sounding correct over being correct.
* **Code is Often Flawed.** Expect bugs, outdated libraries, insecure code, or even damage to data or computer. Always verify before running.
* **The Crutch Trap Stifles Learning.** Over-reliance prevents you from building core problem-solving skills. Struggle is essential for learning to code. Your knowledge gap will grow.
* **Lacks Reproducibility.** AI gives inconsistent answers. You must understand your code to ensure it is reproducible for research.

### Tips / Best Practices

* **You Are the Final Authority.** The AI is a tool that is often confidently wrong. You are responsible for understanding, testing, and owning every line of code.
* **Be a Specific Prompter.** Provide clear context and constraints.
    * **Context:** e.g. "My data has columns X, Y, Z..."
    * **Constraints:** e.g. "Use only the `pandas` library."
    * **Persona:** e.g. "Explain this like I'm a clinician learning to program."
    * **Grounding:** e.g. "Using the documentation I've provided below, explain the `normalize` parameter."
* **Think Small, Test Often.** Decompose your problem. Ask for help with one tiny step, then test the result immediately.
* **Use as a Tutor, Not a Coder.** Best for learning and support.
    * **Explain:** "Explain this `function."
    * **Refactor:** "Make this code more efficient."
    * **Debug:** "Why am I getting an `ERROR` here?"
* **Verify with Official Documentation.** If you see a new function, look it up. The documentation is the source of truth, not the AI.
* **Keep the Context Clean.** Long conversations can confuse the AI. If your topic changes or you get strange errors, start a new chat with only the most relevant information.

## MIMIC-IV
*   For the applied lectures later in the course, we will be using the open dataset MIMIC-IV
    *   https://physionet.org/content/mimiciv/ 

*   Access requires a few steps:
    *   Registering on the website
    *   Completing training on the use of data for research
    *   Applying for credentialled access to view the dataset

*   Detailed instructions are available here: https://bb.imperial.ac.uk/webapps/blackboard/execute/content/blankPage?cmd=view&content_id=_3248343_1&course_id=_42130_1 

*   <span style="color:red">**IT IS VERY IMPORTANT THAT YOU START THE THIS PROCESS SOON - THE TRAINING COURSE AND CREDENTIAL REVIEW CAN TAKE SOME TIME**</span>


## Before Next Session
*   Start the process of getting access to MIMIC
*   Make sure you feel comfortable with all the code written today and tutorial solutions
    *   There are some further tutorials on the [Leganto reading list](https://imperial.alma.exlibrisgroup.com/leganto/public/44IMP_INST/lists/45286392440001591?auth=SAML&section=45286392460001591)

# Self-Study Appendices
Contextual Background, Supplementary Information, and Independent Study Resources

## What is Programming?

<table style='table-layout: fixed; width: 100%; margin-top: 0;'>
  <tbody>
    <tr>
      <td>
        <ul>
          <li>Programming allows us to give instructions to a computer so that it can perform some required computation</li>
          <li>Plugboards and punch cards were originally used to provide instructions</li>
          <li>This has obvious limitations of programming speed, reusability, and scalability</li>
        </ul>
      </td>
      <td>
        <img src="https://live.staticflickr.com/3576/3660521353_4d6a32bd0b.jpg">
        <p class="attribution"><small>"<a rel="noopener noreferrer" href="https://www.flickr.com/photos/17157315@N00/3660521353">ENIAC</a>" by <a rel="noopener noreferrer" href="https://www.flickr.com/photos/17157315@N00">thekirbster</a> is licensed under <a rel="noopener noreferrer" href="https://creativecommons.org/licenses/by/2.0/?ref=openverse">CC BY 2.0 <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg" style="height: 1em; margin-right: 0.125em; display: inline;" /></a>.</small></p>
      </td>
    </tr>
  </tbody>
</table>

### Low-level Languages
*e.g. Assembly and Machine Code*

*   Allow computer instructions to be written in a standardised language

***

```
section .data
    message: db "Hello, world!", 10
    len: equ $-message

section .text
    global _start

_start:
    mov eax, 4      ; system call number for write
    mov ebx, 1      ; file descriptor for standard output
    mov ecx, message ; address of the message
    mov edx, len    ; length of the message
    int 0x80        ; invoke the system call

    mov eax, 1      ; system call number for exit
    mov ebx, 0      ; exit status
    int 0x80
```

### (Early) High-level Languages
*e.g. FORTRAN and C*

*   Language abstracted from the computing architecture
*   Improved portability and reusability

***

```
program hello_world
print *, "Hello, world!"
end program hello_world
```

### Modern High-level Languages
*e.g. Python and Javascript*

*   Increased abstraction
*   Increased simplicity
*   Increased error tolerance and prevention
*   Increased capabilities
*   Increased specialisation for modern tasks (e.g. web)

In [1]:
print("Hello, World!")

Hello, World!


### 'No-Code' Programming Approaches

<table style='table-layout: fixed; width: 100%; margin-top: 0;'>
  <tbody>
    <tr>
      <th>Natural Language Programming</th>
      <th>Visual Programming</th>
      <th>Interactive Data Visualisation (and Analytics) Tools</th>
    </tr>
    <tr>
      <td>
        <img src="https://teaching.bowyer.ai/sdsai/resources/1/img/ChatGPT_HelloWorld.png"></img>
      </td>
      <td>
        <img src="https://upload.wikimedia.org/wikipedia/commons/d/d2/Knime_5.2_GUI.png">
        <small>By Unknown author - www.knime.com, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=147486132">Link</a></small>
      </td>
      <td>
        <img src="https://upload.wikimedia.org/wikipedia/commons/7/73/Global_Temperatures_Server.png">
        <small>By <a href="//commons.wikimedia.org/w/index.php?title=User:Marissa-anna&amp;action=edit&amp;redlink=1" class="new" title="User:Marissa-anna (page does not exist)">Marissa-anna</a> - <span class="int-own-work" lang="en">Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=74539538">Link</a></small>
       </td>
    </tr>
    <tr>
      <td>
        <p>Generate code, or programs directly from natural language instructions and offer higher form of abstraction from HW/language</p>
      </td>
      <td>
        <p>Originally for teaching/understanding basic algorithms, but increasingly available for cloud data/ML pipelines</p>
      </td>
      <td>
        <p>Allow the exploration and analysis of data by selecting from analytics methods a visualisations</p>
      </td>
    </tr>
  </tbody>
</table>

**We will not be focusing on these**


## Other Programming Languages

<table style='table-layout: fixed; width: 100%; margin-top: 0;'>
  <tbody>
    <tr>
      <th>SQL</th>
      <th>R</th>
      <th>Julia</th>
      <th>Prolog</th>
      <th>C++/Java</th>
      <th>MATLAB/ SAS/ STATA</th>
    </tr>
    <tr>
      <td>
        <ul>
          <li>Ubiquitous language for interacting with databases</li>
          <li>Allows efficient extraction, manipulation and analysis of data</li>
          <li>Modern data platforms allow very high performance over petabytes</li>
        </ul>
      </td>
      <td>
        <ul>
          <li>Widely used language with tidy syntax for data manipulation and analytics</li>
          <li>Has many libraries for data science and machine learning</li>
          <li>Easily creates good-looking visualisations</li>
        </ul>
      </td>
      <td>
        <ul>
          <li>A modern python-like language with a focus on data science and AI</li>
          <li>Improved performance over Python</li>
          <li>Nice inbuilt visualisation libraries</li>
        </ul>
      </td>
      <td>
        <ul>
          <li>A declarative language that allows elegant representation of logical rules</li>
          <li>Extensively used in logic-based AI</li>
        </ul>
      </td>
      <td>
        <ul>
          <li>C++ and Java are both widely used, high performance languages</li>
          <li>C++ is appropriate for data processing that requires specialised system programming</li>
          <li>Java is very common in server-side programming for data platforms</li>
        </ul>
      </td>
      <td>
        <ul>
          <li>Proprietary languages/ packages for scientific, mathematical and statistical computing</li>
          <li>In general, I recommend using free/open alternatives</li>
        </ul>
      </td>
    </tr>
  </tbody>
</table>

## Possible Python Development Environments

<table style='table-layout: fixed; width: 100%; margin-top: 0;'>
  <tbody>
    <tr>
      <th>Text Editor and Terminal</th>
      <th>Integrated Development Environment</th>
      <th>Notebooks</th>
    </tr>
    <tr>
      <td>
        <img src="https://teaching.bowyer.ai/sdsai/resources/1/img/Python_Terminal.png">
      </td>
      <td>
        <img src="https://teaching.bowyer.ai/sdsai/resources/1/img/Python_IDE.png">
      </td>
      <td>
        <img src="https://teaching.bowyer.ai/sdsai/resources/1/img/Python_Colab.png">
       </td>
    </tr>
    <tr>
      <td>
        <p>Advantages</p>
        <ul>
          <li>Simple</li>
        </ul>
      </td>
      <td>
        <p>Advantages</p>
        <ul>
          <li>Extensive debug support</li>
          <li>Syntax support</li>
          <li>Architecture support</li>
          <li>Fully flexible</li>
        </ul>
        <p>e.g. <a href="https://www.jetbrains.com/pycharm/">PyCharm</a>, <a href="https://code.visualstudio.com/">VS Code</a></p>
      </td>
      <td>
        <p>Advantages</p>
        <ul>
          <li>Very easy to get started</li>
          <li>Some syntax support</li>
          <li>Enables clean annotated code</li>
          <li>Ideal for learning and exploration</li>
        </ul>
        <p>e.g. <a href="https://jupyter.org/">Jupyter</a>, <a href="https://colab.google/">Colab</a></p>
      </td>
    </tr>
  </tbody>
</table>

## Additional Operators and Functions

#### Assignment Operators
*   You will often want to assign the result of an operator to a variable
*   Assignment operators make it simpler to modify existing variables

In [None]:
# Assignment to a new variable
c = a + b
print(c)

# Assignment to (overwrite) and existing variable
a = a + b
print(a)              # a has been increased by 3

# Assignment operators simplify this syntax (also /=, //=, %=)
a += b
print("a += b:", a)
a -= b
print("a -= b:", a)
a *= b
print("a *= b:", a)

14
14
a += b: 17
a -= b: 14
a *= b: 42


#### For...loop Special Statements
*   There are two special statements that modify for loops
    *   break - exits/stops the loop entirely
    *   continue - stops the current iteration and jumps to the next iteration

#### For...loop Break
*   The 'break' allows you to terminate on given conditions
*   e.g. Stop printing patient names when you find one with fewer than 6 characters

In [None]:
for name in patients:
  print(name)
  if (len(name) < 6):
    print(' * break-ing')
    break

Oliver
Amelia
Noah
 * break-ing


#### For...loop Continue
*   The 'continue' allows you to skip on given conditions
*   e.g. Do not print patient names ending with an 'a'

In [None]:
for name in patients:
  if (name[-1] == 'a'):
    print(' * continue-ing')
    continue
  print(name)

Oliver
 * continue-ing
Noah
 * continue-ing
Liam
