Urban Data Science & Smart Cities <br>
URSP688Y Spring 2025<br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

# Demo 1 - Programming fundamentals

- Pseudocode
- Notebooks
- Intro to programming (with Python)
    - Why Python?
    - Variables
    - Syntax vs. style
    - Basic data types
    - Conditions and loops
    - Errors and debugging

## Pseudocode

You can hash out programming logic without even writing code.

In fact, it's a great idea to start with pseudocode.

That way you can think big-picture without getting distracted by the intricacies of syntax or availability of pre-existing components.

In [5]:
# Pseudocode for making grilled cheese sandwiches with two different approaches

# HIGHER-LEVEL Option A (one-at-a-time style)
# ============
# Count people
# Cook a sandwich for each person

# HIGHER-LEVEL Option B (batch style)
# ============
# Count people
# Do assembly step 1 for all sandwiches at once
# Do assembly step 1 for all sandwiches at once
# ....
# Do assembly step n for all sandwiches at once
# Cook all sandwiches

# LOWER-LEVEL based on Option B
# =============
# Make a list of who wants a sandwich
# Count the number of names on the list
# Get out a griddle large enough to fit that many sandwiches; preheat the griddle
# Get out bread slices for that number of people x 2
# Figure out how many cheese slices you need per sandwich
# Cut the number of cheese slices per sandwich times the number of people
# Butter one side of each bread slice
# Group bread slices into pairs
# Assemble each pair into a sandwich with cheese between two bread slices with butter facing out
# Put all sandwiches on the hot griddle; wait 4 minutes
# Flip all sandwiches

## Notebooks
  - An easy-to-use interface for basic coding
  - "Interactive"
    - This means that you can run code in small blocks and see the output immediately
    - Facilitates iterative coding, where you try something, see if it works, change it, and try again
  - Good for documenting and sharing your work
    - Mix code blocks with text blocks (in [Markdown](https://www.markdownguide.org/basic-syntax/))
    - Easily shareable with others on a team
  - Runnable in the cloud
    - You can run notebooks in the cloud with [Google CoLab](https://colab.research.google.com/).
    - Just replace `https://github.com` with `https://colab.research.google.com/github` in the path for any notebook stored a GitHub repo in CoLab.
        - For example, to open `https://github.com/ncsg/ursp688y_sp2025/blob/main/demos/demo01/demo01.ipynb`, go to [`https://colab.research.google.com/github/ncsg/ursp688y_sp2025/blob/main/demos/demo01/demo01.ipynb`](https://colab.research.google.com/github/ncsg/ursp688y_sp2025/blob/main/demos/demo01/demo01.ipynb)
    - CoLab notebooks are very limited. It's hard to:
      - Connect to data files
      - Commit code to GitHub
      - Run code that needs more than mimimal computing resources
    - CoLab is great for experimentation and demonstration, but bad for building code and conducting analyses.
  - Running Python and Jupyter directly on your computer offers far more capability and makes it easier to work with GitHub.
    - We'll be installing these next week.

[Keyboard shortcuts](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330)

### Everything is in cells. There are two major types of cells:
- `Text`: for writing notes (In JupyterLab, this is called a `Markdown` cell)
- `Code`: for writing code

*NOTE*: In JupyterLab there are also 'Raw' cells, which we're not going to get into. They're more flexible, but rarely used in practice.

### To run a cell:
- Press the play button
- Keyboard shortcuts:
    - Run and stay on the current cell:  `Ctrl (Command) + Enter (Return)`
    - Run and move to the next cell:     `Shift + Enter (Return)`

Here's a markdown cell. You can **bold**, _italicize_, both _**both**_!

To edit a markdown cell, double click on it.

[Here's a cheat sheet for markdown styling.](https://www.kaggle.com/code/cuecacuela/the-ultimate-markdown-cheat-sheet)

In [1]:
# Here's a code cell. Try executing it and see what happens.

print('Hello world')
print(1 + 1)

Hello world
2


### To make a new cell:
- In Colab: Press the `+ Code` or `+ Text` button
- In JupyterLab: Press the `+` button

### To delete a cell:
- Click the trash botton

### Cell order
When you run a cell, you're running code in an order. If your cell relies on something from a previous cell, you have to run that cell first. What's important is the order you *run* a cell, not their order in the notebook.

*NOTE*: This is a key exception to the idea that code runs from top to bottom, which we'll talk about soon.

In [2]:
print(favorite_color)

NameError: name 'favorite_color' is not defined

In [3]:
favorite_color = 'blue'

For simplicity, it's good practice to keep cells in the order they should be run. You can move them with the little up and down arrows on the right.

## Intro to Python Programming

### Python is:  
  - A scripting language
    - *fairly* human-readable (not just 0s and 1s)
  - Used for a wide range of applications
    - Scientific programming (R is also a popular scientific language, but not nearly as widely used for production software)
    - Production software development
  - Open source
    - Free to use and adapt for your own purposes
  - *Very* large support community
    - Google is your best friend
  - Highly extensible
    - Many freely-available packages with pre-built tools
    - You can modify existing packages
    - You can write your own custom package
  - Multi-platform
    - Run on a Mac, PC, Linux, Unix, in the cloud, etc.

<big><big><big><big>NOTE: We are going to spend some extra time on programming fundamentals.</big></big></big></big>

Many data science classes skip the basics and head straight for tabular analysis and machine learning.

However, learning basic programming will make you a better programmer, scientist, and, dare I say, *thinker*. It is often more:
- Efficient
- Flexible
- Reliable
- Transferable

### Syntax

How code is written as a language that is understandable by a computer.

Code has to follow grammatical rules, just like any language. 

### Some basic syntax
- Variable assignment
- Operators
- Statements
- Indentation
- Comments

### Style

Some aspects of how code is written don't change how a computer interprets it, but do make it easier or harder for a human to understand.

- *Syntax*: will my code run (be readable by the computer)?
- *Style*: will my code be easily readable by a human (including myself!)?

#### Variables

- Name-based storage containers
- Can be easily recalled
- Can usually be updated

Variable names are completely arbitrary. They can be any combination of letters and numbers. BUT, there can't be any spaces and they can't start with a numeral.

In [4]:
name = 'Chester'
print(name)

Chester


In [5]:
day = 'Monday'

The abstraction of variables is useful for changing inputs to make new outputs.

In [6]:
calendar_text = f'The day of the week is {day}'
print(calendar_text)

The day of the week is Monday


In [7]:
day = 'Thursday'
print(calendar_text)

The day of the week is Monday


What's wrong with this code?

In [8]:
1st_day_of_the_week = 'Monday'

SyntaxError: invalid decimal literal (3013576040.py, line 1)

#### Operators

[Here's a complete list](https://www.tutorialspoint.com/python/python_basic_operators.htm)

In [9]:
# Arithmetic (returns a number)
1 + 1
1 - 1
1 * 1
1 / 1
1**1 # exponent
1 % 1 # modulus: remainder after division
1//1 # floor division: decimals removed from quotient

1

In [10]:
# Comparison (returns True or False)
1 == 1 # equivalence
1 != 1 # non-equivalence
1 > 1
1 >= 1

True

In [1]:
# Assignment
x = 1
x += 1
x

2

In [12]:
# Logical (inputs must be True or False; returns True or False)
a = True
b = False
a and b
a or b

True

In [13]:
# Membership (returns True or False)
'e' in 'Chester'

True

#### Statements

In [14]:
# A single unit of instruction, usually written on one line.
# Statements are executed in order.
a = 1
a + 2

3

#### Comments

Quickly comment or uncomment a line of code (or multiple selected lines) with `Ctrl (Command) + /`

In [16]:
# Super helpful for annotating your code, especially outside of notebooks, where you don't have text cells

# They're also helpful for temporarily "commenting out" code you don't want to run, but don't want to delete
# 1 + 1

### Basic Data Types

#### String
Text. Must be surrounded by either double (") or single (') quotes

In [17]:
name = "Chester"
print(type(name)) # This statement uses the type and print functions to show that example_string is, in fact, a string
print(name) # This statement uses the print function to show the contents of example_string

<class 'str'>
Chester


[F-strings](https://realpython.com/python-f-strings/) are a super handy syntax for building strings with variables.

In [18]:
f'My name is {name}'

'My name is Chester'

#### Integer
Number without decimal places

In [19]:
age = 25
print(type(age))
print(age)

<class 'int'>
25


#### Float
Number with decimal places

In [20]:
height = 5.95
print(type(height))
print(height)

<class 'float'>
5.95


#### Boolean
True or False

In [21]:
private_jet = False
print(type(private_jet))
print(private_jet)

<class 'bool'>
False


---
We made it this far in class on Week 1

---

### Basic style guidelines for Python
- At the very least, do *everything* consistently
- One statement per line
- Try to limit line length to 72 characters
- Use four spaces to indent
- Put spaces around operators (e.g., `1 + 1` or `day = 'Monday'`) (except in keyword function arguments)
- Use blank lines intentionally and consistently
- Use meaningful names
- Name variables and functions with `lowercase_underscores`
- Constants are often named in `ALL_CAPS_WITH_UNDERSCORES` (e.g., `C = 2.99792458e+8`)
- Name custom classes with `CapWords`
- In general, avoid spaces in folder and filenames used for programming

See [Code Readability](https://github.com/ncsg/ursp688y_sp2024/blob/main/README.md#code-readability) on the syllabus. [CS61A](https://cs61a.org/articles/composition/) has an excellent composition guide. [PEP 8](https://peps.python.org/pep-0008/) is a standard Python style guide. [Google](https://google.github.io/styleguide/pyguide.html) publishes their internal Python style guide.

### Composite Data Types

#### List
An ordered array of objects.

In [22]:
fridge_contents = ['milk','apple','celery','yogurt']
print(type(fridge_contents))
print(fridge_contents)

<class 'list'>
['milk', 'apple', 'celery', 'yogurt']


In [23]:
# You can add lists together
fridge_contents = fridge_contents + ['orange juice', 'leftovers']
fridge_contents

['milk', 'apple', 'celery', 'yogurt', 'orange juice', 'leftovers']

In [24]:
# Or append elements to a list
fridge_contents.append('cheese')
fridge_contents

['milk', 'apple', 'celery', 'yogurt', 'orange juice', 'leftovers', 'cheese']

In [25]:
# Or remove things
fridge_contents.remove('yogurt')
fridge_contents

['milk', 'apple', 'celery', 'orange juice', 'leftovers', 'cheese']

In [26]:
# You can look things up in a list by index number, starting with 0
fridge_contents[0]

'milk'

In [27]:
# Or get just a part of a list with "indexing"
fridge_contents[:2]

['milk', 'apple']

#### Dictionary

Labeled data stored as key-value pairs.

*Note*: Dictionaries used to be unordered, but as of Python 3.6 they technically maintain their order. Lists are still usually preferred when order matters. There's also something called an [ordered dictionary](https://realpython.com/python-ordereddict/), which makes it more explicit that you care about order and can make it easier to manage/change order.

In [28]:
goodness_at_sports = {
    'basketball': 2,
    'baseball': 1,
    'skiing': 8,
    'volleyball': 3,
}
print(type(goodness_at_sports))
print(goodness_at_sports)

<class 'dict'>
{'basketball': 2, 'baseball': 1, 'skiing': 8, 'volleyball': 3}


In [29]:
# You can add an entry to a dictionary
goodness_at_sports['cornhole'] = 3

In [30]:
# And remove one
goodness_at_sports.pop('baseball')

1

In [31]:
# And look up values based on keys
goodness_at_sports['skiing']

8

#### Indentation

Python relies on indentation as a part of its syntax, so it's very important.

Indents tell the code how statements are logically nested.

Press `Tab` to indent, or `Shift + Tab` to unindent.

Colab and JupyterLab will automatically insert four spaces per tab.

In [15]:
if 5 > 2:
    print('Five is greater than two!')

Five is greater than two!


### Programming logic

Now that we've got basic building blocks, we can *do* things with them.

This requires programming logic: using logical statements to control the flow of our code in productive ways.

#### [Conditions](https://realpython.com/python-conditional-statements/)

In [32]:
age = 10
if age < 18:
    print('child')
else:
    print('adult')

# Can we add a third condition for teenager?

child


#### Loops

Python has both [`for` loops](https://realpython.com/python-for-loop/) and [`while` loops](https://realpython.com/python-while-loop/).

We're going to focus on `for` loops because they're most commonly used in data science. While loops are particularly handy in applications that respond to dynamic inputs from users or data streams.

In [33]:
ages = [5, 10, 65, 81, 45]

for age in ages:
    if age < 18:
        print('child')
    else:
        print('adult')

child
child
adult
adult
adult


In [2]:
people = {
    'Daniela': 5, 
    'Zoe': 10,
    'Rowen': 65,
    'Jude': 81,
    'Austin': 45,
}

for name, age in people.items():
    if age < 18:
        age_desc = 'a child'
    else:
        age_desc = 'an adult'
    print(f'{name} is {age_desc}')

Daniela is a child
Zoe is a child
Rowen is an adult
Jude is an adult
Austin is an adult


### Errors and debugging

Errors are frustrating and inevitable. Even professional programmers probably spend most of their time debugging.

Luckily, there are good tools and techniques for making debugging a little easier.

Despite these, you will probably nearly tear your hair out with some frequency, especially as a beginner. It will get better with time.

There are two types of errors in programming: logic and syntax. They both result in your program not achieving its goal, but the first may not be as easily detectable because the code may still run.

#### Logic errors
These are issues with how you have approached or executed your problem. If your code runs but produces nonsensical results, there is probably a logic error. However, your erroneous code might also produce logical but *wrong* results; you might never notice until the problem has rippled downstream. It's best to address this proactively by planning your code well so it's less likely to be illogical, and writing readable code that can be easily reviewed.

Here's a logic error. Can you find it? (Hint: the issue is syntactical, but it's still a logic error because the code works without throwing an error.)

In [3]:
for name, age in people.items():
    if age < 18:
        age_desc = 'a child'
    else:
        age_des = 'an adult'
    print(f'{name} is {age_desc}')

Daniela is a child
Zoe is a child
Rowen is a child
Jude is a child
Austin is a child


#### Syntax errors
These are more obvious because your code will simply fail. There are lots of tools for figuring out where and why.

Error messages are usually the starting place for debugging a syntax error.

In [4]:
for name, age in people:
    if agete < 18:
        age_desc = 'a child'
    else:
        age_desc = 'an adult'
    print(f'{name} is {age_desc}')

ValueError: too many values to unpack (expected 2)

The error message tells us where the problem is located.

Sometimes, it can be helpful to turn on line numbers.
- In Colab: `Tools -> Settings -> Editor -> Show line numbers`
- In JupyterLab: `View -> Show Line Numbers`

The `ValueError` tells us that the issue is related to the value of a variable on this line, but it's still pretty vague.

Time to start [Googling](https://www.google.com/).
