# Python Fundamentals

Python is often described as a high-level, object-oriented language that emphasizes readability and simplicity. But what does this really mean? At its core, Python is designed to let you express ideas with minimal effort, much like having a natural conversation rather than deciphering complex machine instructions. Unlike Stata, which is command-driven, Python is more like a versatile toolbox, allowing you to construct solutions with reusable and flexible components.

A key concept in Python is that everything is an object—numbers, text, lists, functions, and even the modules you import are all objects with properties (attributes) and actions they can perform (methods). Think of it like everyday objects: a car has attributes (color, brand, speed) and methods (accelerate, brake, honk). Similarly, a string in Python has attributes (its length) and methods (.upper() to capitalize it). This design makes Python intuitive and powerful. To "speak Python," you need to think in terms of objects and their interactions—just as you describe real-world objects and what they can do.


In [4]:
print("Hello CETers!")

Hello CETers!


### 0. Understanding what it means that Python is an object-oriented language, its logic and syntaxis, how data is stored, and conceptual differences in compared to Stata

When we say Python is an object-oriented language, think of it like organizing your tools in a toolbox. Each tool is like an "object" – say, a hammer or screwdriver – with its unique purpose and features. In Python, everything you use is an object, whether it’s a simple number, a text string, or a complex data structure. These objects are based on "classes," which you can think of as blueprints for creating specific tools, defining what the objects can do and how they work.

Because Python uses this approach, it makes it easier to organize and manage your code, especially as your projects get bigger. You can build your own custom tools (objects) by defining classes, and reuse them whenever needed. This concept is different from some other programming styles where operations are handled in a more straightforward, step-by-step manner without this kind of blueprint-based organization.

Comparing Python to Stata, which is software specifically designed for statistical analysis, there's a notable difference. Stata is more like a powerful calculator that lets you directly perform statistical tasks with simple commands. It’s tailored for data analysis functions and is quite user-friendly for that purpose. You type a command, and it processes the data accordingly.

On the other hand, Python is more like a Swiss Army knife. It’s highly versatile and can handle a wide range of tasks beyond just data analysis, such as building websites, automating tasks, and more. This versatility comes with a bit more complexity because you might have to understand some programming concepts to fully take advantage of it. But it also means Python offers more flexibility and power when you need to create custom solutions or handle various tasks within a single environment.



### 1. Basic Data Types and Structures

Python has several built-in data types that serve as the foundation for all programming tasks. Here are the most common ones:

In [1]:
# Integers (int) – Whole numbers, positive or negative.
print("integer:", -10)

# Floating-Point Numbers (float) – Numbers with decimals. To create floats, we can just use a decimal (.) point:.
print("float:", 3.14)

# Strings (str) – Text enclosed in quotes.
print("string:", "Python")

# Booleans (bool) – Represent True or False values.
print(True, False)

# Lists (list) – Ordered, mutable, iterable collections of items.
fruits = ["apple", "banana", "cherry"]
print("string:",fruits)

# Tuples (tuple) – Ordered, immutable, iterable collections.
coordinates = (40.7128, -74.0060)
print("tuple:",coordinates)

# Dictionaries (dict) – Key-value pairs for fast lookups.
person = {"name": "Alice", "age": 30}
print("dictionary:",person)

# Sets (set) – Unordered collections of unique values.
unique_numbers = {1, 2, 3, 3, 2} # → {1, 2, 3}
print("set:",unique_numbers)

integer: -10
float: 3.14
string: Python
True False
string: ['apple', 'banana', 'cherry']
tuple: (40.7128, -74.006)
dictionary: {'name': 'Alice', 'age': 30}
set: {1, 2, 3}


Something that's worth mentioning, float numbers sometimes do not have an exact internal representation when they're stored in your machine. For instance:

In [10]:
print(0.1)
print(format(0.1, '.25f')) # print 0.1 rounding to its 25th decimal using the format function (functions explained later)

0.1
0.1000000000000000055511151


This is a fundamental concept in computer science, and it's not just Python. Almost all programming languages that use standard floating-point numbers (like float in Python) cannot store 0.1 as exactly 0.1. The reason mostly lies in the difference between how humans represent numbers (decimal, base-10) and how computers represent them (binary, base-2). A decimal number can be represented exactly in binary floating-point if and only if it can be expressed as a fraction where the denominator is a power of 2.
 
This is not super important but it's good to remember, as the output we're seeing is Python's string representation — that is, just text printed to the terminal, not the actual internal value.

Python tries to make things more readable by formatting numbers in a user-friendly way. However, this can hide what’s really going on under the hood.

Not remembering this could lead to not understanding the following:

In [18]:
# we use "==" to make an assertion.
print(0.1 + 0.1 + 0.1 == 0.3) 

#However, a float such as 0.125, which is 1/8, is therefore representable as a finite binary fraction (1/2^3).
print(0.125 + 0.125 + 0.125 == 0.375)

False
True


The fact that 0.1 cannot be stored exactly as 0.1 in binary floating-point does not mean that you can't perform arithmetic operations with it. It just means that the numbers involved in the calculation are incredibly close approximations of the true decimal value.

In [35]:
print(format(0.1 + 0.1 + 0.1, '.1f'))

0.3


### 2. Variables



We can use the `=` symbol as the assignment operator. 

When naming objects we should consider using naming conventions: best practices for naming variables (e.g., using lowercase letters, underscores).

Python coders typically follow a standard naming convention for variable names. Depending on the type of variable, there are different conventions.

In particular, you should follow the PEP8 conventions: https://www.python.org/dev/peps/pep-0008/

You should try reading this guide once in a while (not all in one sitting, but just here and there as time permits - it's quite useful).






Assign values to variables

In [36]:
# Assigning an integer
age = 30

# Assigning a float
height = 5.9

# Assigning a string
name = "Alice"

# Assigning a boolean
is_student = True

Checking data types

In [38]:
print(type(age))        # Output: <class 'int'>
print(type(height))     # Output: <class 'float'>
print(type(name))       # Output: <class 'str'>
print(type(is_student)) # Output: <class 'bool'>

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


Type conversion: How to convert between different data types using functions like int(), float(), str(), and bool().

In [39]:
# Converting float to int
height_int = int(height)  # 5

# Converting int to float
age_float = float(age)    # 30.0

# Converting int to string
age_str = str(age)        # "30"

# Converting string to int
age_from_str = int(age_str)  # 30


Dynamic typing: Understanding that Python variables can change type based on the assigned value.

In [40]:
# Variable initially holds an integer
var = 10
print(type(var))  # Output: <class 'int'>

# Now assigning a string to the same variable
var = "Ten"
print(type(var))  # Output: <class 'str'>


<class 'int'>
<class 'str'>


Variable reassignment: How to update the value of an existing variable.

In [41]:
counter = 1
print(counter)  # Output: 1

counter = counter + 1
print(counter)  # Output: 2


1
2


Now `x` is a symbol (variable) in our program that we can invoke or recall.

In [28]:
x

10

In [31]:
# We can use it in subsequent parts of our code:
y = x + 10
print(y)

20




As you can see, here we performed a calculation in the right hand side, and assigned that resulting value to the label on the left (b).

Note that the = symbol is not the same as the mathematical equality symbol, so something like this is not valid Python, and we'll get an exception (or error):


In [32]:
10=10

SyntaxError: cannot assign to literal (2975699964.py, line 1)

In [34]:
# we can reassign different variables
x = 3.14
print(x) # no longer x == 10

3.14


We should also be careful when naming our variables as labels that Python has already previously defined. For instance, we could rename the symbol `float`, which Python uses to represent the float data type:

In [52]:
# Representing a float number
a = float(10)

# remapping the float symbol
float = 20

# now we get an error when we do as the original definition has been lost to us (although Python still has it)
a = float(10)


TypeError: 'int' object is not callable

In [53]:
# we can simply delete our definition
del float

# no error now
a = float(10)

Python developers typically follow a standard naming convention for variable names. Depending on the type of variable, there are different conventions.

In particular, we follow the PEP8 conventions:

https://www.python.org/dev/peps/pep-0008/

You should try checking this guide once in a while if you use Python extensively. Otherwise you can access a short version in a poem form by typing the following:


In [55]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### 3. Built-in methods for data analysis

Python has several built-in methods and functions that are useful for basic data analysis, even before importing any external libraries like pandas or numpy.

`sum(iterable)`: Returns the total sum.

`len(iterable)`: Returns the number of items.

`min(iterable)`: Returns the smallest item.

`max(iterable)`: Returns the largest item.

`sorted(iterable, key=None, reverse=False)`: Returns a sorted list.

`round(number[, ndigits])`: Rounds a number to a given precision.

In [59]:
# for example
data = [1, 2, 3, 4, 5]
average = sum(data) / len(data)
print(average)

3.0


The data structures we saw above `list` `dict` `set` and `tuple` also allow for certain built-in methods such as `.append()`, `.remove()`, `.keys()`, `.values()`, `.items()`, `.count()` or `.index()`.

In [None]:
#For example
data = [1, 2, 2, 3] # a list
print(data.count(2)) # counts the number of items that we input



2
{1: 1, 2: 2, 3: 1}


In [72]:
# we can also apply a built-in function to iterate over an object
# This is a dict comprehension, basically a loop in one line. Fast and readable data transformation
counts = {x: data.count(x) for x in set(data)} 
print(counts)

# this would be equivalent
for x in set(data):
    print({x: data.count(x)})

{1: 1, 2: 2, 3: 1}
{1: 1}
{2: 2}
{3: 1}


Python also has built-in modules or libraries that we can simply import and use. However, most of them we will need to install ourselves.

In [82]:
# basic stats
import statistics
statistics.mean([1,2,3])

# useful data structures. Others might be defaultdict and namedtuple
from collections import Counter
print("This is Counter:", Counter(['a','b','a']))

# advanced iteration tools 
from itertools import combinations
items = ['a', 'b', 'c']
result = list(combinations(items, 2))
print("This is itertools", result)  # [('a', 'b'), ('a', 'c'), ('b', 'c')]

# for mathematical functions
import math
print("This is math:", math.sqrt(16))

# for date/time operations
from datetime import datetime
now = datetime.now()
print("This is datetime:", now)

# for cleaning and formatting: str.strip(), str.lower(), str.upper(), str.replace(), str.split(), str.join()
cleaned = [s.strip().lower() for s in ['  A ', 'B  ', ' c']]
print("This is lower:", cleaned)

This is Counter: Counter({'a': 2, 'b': 1})
This is itertools [('a', 'b'), ('a', 'c'), ('b', 'c')]
This is math: 4.0
This is datetime: 2025-06-03 18:48:07.048322
This is lower: ['a', 'b', 'c']


However, we will be likely doing anything beyond simple operations like the above, so we'll typically use:

`pandas` for data frames (more similar to Stata)

`numpy` for numerical arrays

`matplotlib` / `seaborn` for visualization

But before that, it's useful to understand how custom functions and objects work in order to leverage the power to organise, reuse and simplify code.

In [83]:
# the basic syntax is as follows:
def function_name(parameters):
    # code block
    return result  # (optional)

In [85]:
# Simple function 1
def greet():
    print("Hello!")
    
greet()

Hello!


In [87]:
# Simple function 2
def greet(name):
    print(f"Hello, {name}!")

greet("Borja")

Hello, Borja!


In [88]:
# Simple function 3
def square(x):
    return x * x

result = square(4)
print(result)

16


We use this a lot in our daily data analysis. See for instance a real function that we have used in several cases.

In [90]:
def calculate_distance(row):
    ''' calculates distance in km between origin and dest coordinates'''
    if None in [row['origin_lat'], row['origin_lon'], row['dest_lat'], row['dest_lon']]:
        return None  # Return None if any coordinates are missing
    origin_coords = (row['origin_lat'], row['origin_lon'])
    dest_coords = (row['dest_lat'], row['dest_lon'])
    return geodesic(origin_coords, dest_coords).kilometers  # Calculate distance in kilometers

Making custom functions that are efficient, nice and useful is not easy, but I usueally try to keep the following in mind:
- Think of functions like mini-programs: input → process → output.
- Use them to avoid repeating code.
- Use clear names to describe what they do.

### BONUS: Creating objects

Object oriented programming is a programming style based on objects—data (attributes) and behaviors (methods/functions) bundled together. In Python, you create objects from classes. This is how modules are created, and understanding how to make them is one of the most powerful tools one can have in Python.

A class is a blueprint for creating objects. Imagine you're designing a car in code. A real car has:
- Attributes (measurements or state): color, speed, brand
- Methods (actions or behaviors): drive forward, drive backward, stop

In [93]:
class Car:
    def __init__(self, brand, color): # this is needed to instantiate the class (kind of initialise it)
        self.brand = brand      # attribute
        self.color = color      # attribute

    def forward(self):          # method
        print("Car is moving forward.")

    def backward(self):         # method
        print("Car is reversing.")

Now we create an actual car based on that blueprint:

In [94]:
my_car = Car("Toyota", "red")

my_car.forward()   # Output: Car is moving forward.
my_car.backward()  # Output: Car is reversing.

Car is moving forward.
Car is reversing.


So my_car is now an object of type Car. It has measurements (brand, color) and can perform actions (forward, backward).

You can only do what the class allows. If you try something it wasn’t designed for, it will return an error:

In [95]:
my_car.sideways()  # ❌ Error: 'Car' object has no attribute 'sideways'

AttributeError: 'Car' object has no attribute 'sideways'

One of the useful parts of object oriented programming is inheritance. Let’s say we want to make a RaceCar, which is just like a Car, but with extra power.

We can inherit all the properties from Car:

In [96]:
class RaceCar(Car):  # Inherit from Car
    def boost(self):
        print("RaceCar is boosting speed!")

Now the new object has everything a Car has, plus more:

In [97]:
fast_car = RaceCar("Ferrari", "yellow")
fast_car.forward()  # Inherited from Car
fast_car.boost()    # New method in RaceCar

Car is moving forward.
RaceCar is boosting speed!


While not essential for our day-to-day data analysis, investing time in developing custom modules for the CET to use in recurring tasks could unlock a range of exciting and valuable projects.