Skip to content
Robbie Hume edited this page May 12, 2025 · 91 revisions

Table of Contents (Click me)

  1. General Notes
  2. Python Environment Notes
  3. Specific version features
  4. Python Language Notes
  5. Add to google colab
  6. requests, urllib(3), & http modules
  7. Concurrent, asynchronous, multiprocessing
  8. Performance / memory considerations
  9. itertools
  10. Misc.
  11. Do specific things

Colab Notebooks

Links

General resource websites:


Look into

General Notes

  • To make FIPS compliant, from hashlib use sha256() instead of md5()

Python Environment Notes

pip

  • Install pip: link
  • PyPI is the central repository of software that pip uses to download from
  • pip installs packages, which are usually bunded into 'wheels' (.whl) prior to installation
  • A wheel is a zip-style archive that contains all the files necessary for a typical package installation
  • It may be best to run it as python3 -m pip instead of using the system-installed pip
    • Why you should use python -m pip
    • This uses executes pip using the Python interpreter you specified as python
    • This is beneficial so you know what pip version is being used if you have multiple python versions installed

Tips

  • See package dependencies: pipdeptree
    • Install it first with pip install pipdeptree
  • Loop through n first elements in list link: for item in itertools.islice(my_list, n)

Wheel files (.whl)

Virtual environments (venv)

  • Complete guide to python virtual environments
  • Virtual environments are used when you want to be more explicit and limit packages to a specific project
  • You should never install stuff into your global Python interpreter when you develop locally
  • To create an environment: python -m venv <venv name>

Python Language Notes

Variables and parameter passing

  • Deep dive into variables in Python
  • Python uses pass-by-object-reference for function parameter passing
    • Any changes to the actual object held in the parameter will change the original variable
    • Any reassignment will not be reflected in the original variable
  • When passing a variable of a mutable object (list, dict, set, classes, etc.), make sure that you pass a copy if you don't want the original variable object to be modified by any changes in the function

Duck typing

  • Duck typing is a programming concept where the suitability of an object is determined by the presence of certain methods and properties, rather than the object's actual type
  • Instead of checking an object's type explicitly, code assumes that if it "quacks like a duck" (i.e., has the necessary behavior), it can be used in the desired context
  • Key points:
    • Behavior over type: The focus is on what an object can do, not what class it belongs to.
    • Flexibility: Allows functions to operate on any object that implements the expected interface, promoting reusable and adaptable code.
    • Python's dynamic nature: Python commonly uses duck typing, enabling developers to write generic functions that work with various objects as long as they provide the required attributes or methods
  • Ex:
    • def quack_and_walk(duck):
          duck.quack()
          duck.walk()
      
      class Duck:
          def quack(self):
              print('Quack!')
      
      class Person:
          def quack(self):
              print('I can quack like a duck!')
      
      # Both objects work with quack because they implement quack()
      quack(Duck())
      quack(Person())

Dunder Methods (aka Magic Methods)

  • Dunder methods are methods that allow instances of a class to interact with the built-in functions and operators of the language

** (un)packing

Exceptions; try / except

  • Raising / handling custom exception:
    • class ExampleException(Exception):
        pass
      ...
      try: 
        if !var:
          raise ExampleException
        else:
          // do work
      except Exception as err: 
        if type(err) == ExampleException:
          // handle exception
      finally:
        // always runs regardless (even if the other blocks have a return statement)
        print('inside')
    • The finally block always runs regardless, even if the other blocks have a return statement
      • If the finally block has a return also, it overwrites the return from the other block

Emptiness / None check

  • if not nums is preferred and quicker than if len(nums) == 0
  • If need to explicitly check for None, need to do if nums is None

Add to google colab:

  • String methods: .startswith() and .endswith() instead of string slicing
  • Create dictionary from two diffferent sequences / collections: dict(zip(list1, list2))
  • Update multiple key/value pairs in a dticitonary: d1.update(key1=val1, key2=val2) or d1.update({'key1': 'val1', 'key2': 'val2'})
    • Ex: class_grades.update(algebra='A', english='B')

Dictionaries:

Generators:

Classes / OOP

  • To access the base class methods and attributes, you can use super()
    • Ex: calls both display_info() functions
      class Polygon:
          def __init__(self, sides):
              self.sides = sides
      
          def display_info(self):
              print("A polygon is a two dimensional shape with straight lines.")
      
      class Triangle(Polygon):
          def display_info(self):
              print("A triangle is a polygon with 3 edges.")
              super().display_info() # call the display_info() method of Polygon
  • If you have an __init__() constructor in the child class, you need to call super() inside it so that it initializes the attributes from the parent class
    • Ex: calls both display_info() functions
      class Person:
          def __init__(self, name):
              self.name = name
      
      class Student(Person):
          def __init__(self, student_id):
              self.student_id = student_id
              super().__init__()  # instantiate the Person attributes
  • @staticmethod decorator defines a class method as static. It doesn't take a self parameter

Python Code Organization: Classes, Files, Modules, Packages, Libraries, Frameworks, and Imports

  • modules ⊆ packages ⊆ libraries ⊆ frameworks

Python Files (.py)

  • A Python file (.py) contains Python code (functions, variables, classes, etc.)
  • It can be executed directly or imported as a module

Modules (single file)

  • A module is simply a Python file (.py) that can be imported into another Python file
  • Modules help organize and reuse code
  • Standard modules (like math, os) are built-in

Packages (directory with modules)

  • A package is a directory containing multiple modules and an __init__.py file (optional in Python 3.3+)
  • Helps organize large projects
  • Example Package Structure:
      my_project/
      │── main.py
      │── my_package/
      │   │── __init__.py
      │   │── module1.py
      │   │── module2.py
    

Libraries (Collection of Modules & Packages)

  • A collection of modules or packages that provide reusable functionality
  • Examples: requests, numpy, flask

Frameworks (Structured Library for a Purpose)

  • A structured collection of libraries and conventions that help build applications
  • Examples: Django (web development), Flask (web framework), PyTorch (machine learning)

requests, urllib(3), & http modules

Newer / higher-level: urllib3 vs requests

  • urllib3 and requests
  • Feature urllib3 requests
    Level of Control High (Low-level, customizable) Medium (High-level, abstracted API)
    Ease of Use Easier than urllib, more setup than requests Very easy, user-friendly
    Connection Pooling Automatic, efficient pooling Automatic, abstracted
    Performance Lightweight, efficient for many requests Slightly heavier due to extra features
    Retries and Timeouts Customizable, easy to configure Built-in, simplified
    Code Readability More readable than urllib, but still verbose Highly readable, concise
    Community/Documentation Moderate community, adequate docs Large community, extensive documentation
    Dependencies Few dependencies More dependencies (heavier library)
    Handling JSON and Sessions Basic handling Built-in, user-friendly

requests

  • requests Session to keep state
  • For most use cases, requests is the best choice. It is:
    • User-friendly: Easy to use with a simple API
    • Readable: Produces clean, concise, and maintainable code
    • Feature-rich: Includes built-in support for sessions, cookies, retries, and JSON handling
    • Community: Has extensive documentation and a large user base

urllib3

  • placeholder

Older / lower-level: urllib vs http(.client)

  • Useful when you need to avoid third-party dependencies or require very low-level control

  Comparison table(Click me)

  • Feature urllib http.client
    Level of Control High (Low-level API, manual setup) Very High (Raw HTTP control)
    Ease of Use Complex, verbose More complex, highly verbose
    Connection Pooling No built-in pooling Manual connection handling
    Performance Lightweight but verbose Lightweight, but requires manual setup
    Retries and Timeouts Manual setup Manual setup
    Code Readability Verbose, not user-friendly Highly verbose, least readable
    Community/Documentation Limited (older, standard lib) Limited (low-level library)
    Dependencies No external dependencies (built-in) No external dependencies (built-in)
    Handling JSON and Sessions Manual handling Manual handling

http

Concurrent, asynchronous, multiprocessing

  • concurrent.futures allows for easy integration of async functionality for certain parts of a mostly synchronous program

Performance / memory optimization considerations

  • Sets (O(1)) have faster lookup times than lists (O(n))
  • List comprehensions / generator expressions are typically faster than filter() / map() / reduce() combinations
  • Generator expressions are preferred over list comprehensions when possible
    • Generator expressions produce values on-the-fly and are more memory-efficient and typically faster than list comprehensions, as it avoids creating an intermediate list
    • However, generator expressions can be slower than list comprehensions for small datasets due to the overhead of creating the iterator
  • if not my_list is ~2x faster than if len(my_list) == 0

itertools

itertools tutorial / documentation

Note: the operator module is used in some examples, but it is not necessary when using itertools

  • accumulate(): makes an iterator that returns the results of a function
    • itertools.accumulate(iterable[, func])
    • Passing a function
      data = [1, 2, 3, 4, 5]
      result = itertools.accumulate(data, operator.mul)
      print(list(result))   # [1, 2, 6, 24, 120]
    • Without passing a function (defaults to summation)
      result = itertools.accumulate(data)
      print(list(result))   # [1, 3, 6, 10, 15]
  • combinations(): takes an iterable and a integer. This will create all the unique combination that have r members
    • itertools.combinations(iterable, r)
      shapes = ['circle', 'triangle', 'square',]
      result = itertools.combinations(shapes, 2)
      print(list(result))   # [1, 2, 6, 24, 120]
  • count(): makes an iterator that returns evenly spaced values starting with number start
    • Similar to range(), but works for an infinite sequence (and is more memory efficient?)
    • itertools.count(start=0, step=1)
      for i in itertools.count(10,3):
          print(i)
          if i > 20:
              break
      # 10, 13, 16, 19, 22  (as individual lines)
  • cycle(): cycles through an iterator endlessly
    • itertools.cycle(iterable)
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.cycle(colors):
          print(color)
      # red, orange, yellow, green, red, orange, ...  (as individual lines)
  • chain(): cycles through an iterator endlessly
    • itertools.cycle(iterable)
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.cycle(colors):
          print(color)
      # red, orange, yellow, green, red, orange, ...  (as individual lines)
  • islice():
    • Similar to index slicing ([:x]), but is more memory-efficient and can handle infinite and non-indexable iterables
    • itertools.islice(iterable, start, stop[, step])
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.islice(colors, 2):
          print(color)
      # red, orange (as individual lines)
  • permutations():
    • itertools.permutations(iterable, r=None)
      alpha_data = ['a', 'b', 'c']
      result = itertools.permutations(alpha_data)
      list(result)  # [('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]
  • product(): creates the Cartesian products from a series of iterables.
    • itertools.permutations(iterable, r=None)
      num_data = [1, 2, 3]
      alpha_data = ['a', 'b', 'c']
      result = itertools.product(num_data, alpha_data)
      list(result)  # [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'), (3, 'a'), (3, 'b'), (3, 'c')]

Misc.

Version features

3.8

  • Assignment Expressions (walrus operator) (link): can use := in an expression in a while loop or if statement to assign and evaluate it
    • Ex: if (y := 2) > 1: # sets y = 2 and evaluates the expression as 2 > 1
    • Ex: while (user_input := input("Enter text: ")) != "stop": # keeps getting user input until "stop" is entered
    • Can also use it in list comprehensions: [result for i in range(5) if (result := func(i)) == True]
      • It is more efficient because it potentially only makes half the func() calls compared to [func(i) for i in range(5) if func(i) == True]
  • f-string improvements: Now supports the = specifier for debugging (f"{var=}")

3.9

  • Dictionary union operators (| and |=):
    • d1 | d2 results in new dictionary resulting from the union of d1 and d2
    • Can use |= to do an in-place (update) union: d1 |= d2 // will make d1 equal to the resulting union
  • .removeprefix() and .removesuffix(): methods to simplify removing prefixes and suffixes from strings

3.10

  • Pattern matching (match and case) statements:
    • Introduces a match statement similar to switch-case, allowing pattern matching
    • Example
      match command:
      case 'start':
          start_process()
      case 'stop':
          stop_process()
  • Parenthesized Context Managers:
  • Allows using multiple context managers more neatly.
  • Example: with (open('file1') as f1, open('file2') as f2):

3.11

  • Significant Performance Improvements:
    • Python 3.11 includes performance improvements, claiming to be around 10-60% faster than Python 3.10
  • Exception Groups (ExceptionGroup) and except*:
    • Allows raising and handling multiple exceptions simultaneously.
    • except* is used to handle ExceptionGroup objects
      • This allows you to handle multiple exceptions raised together, enabling you to catch subsets of exceptions more precisely
    • Example:
      try:
        raise ExceptionGroup("Multiple Errors", [ValueError("Invalid value"), TypeError("Type mismatch")])
      except* ValueError as e:
        print(f"Caught ValueError: {e}")
      except* TypeError as e:
        print(f"Caught TypeError: {e}")
  • taskgroups in asyncio:
    • Easier way to manage groups of asynchronous tasks.
    • Example:
      async with asyncio.TaskGroup() as tg:
        tg.create_task(some_coroutine())

3.12

  • Enhanced async and await:
    • Improvements in asyncio and asynchronous task handling for better performance and simpler code patterns

Do specific things:

  • Debug print:
    • DEBUG == True
      def print_debug(*args, **kwargs):
          if DEBUG == True:
              print(' '.join(map(str,args)), **kwargs, flush=True)
  • Print traceback of error after catching an Exception: traceback.format_exc()
  • Check if variable or attribute exists, without causing an error if it doesn't:
    • if 'myVar' in locals():
          # myVar exists in local scope
      if 'myVar' in globals():
          # myVar exists in global scope
      if hasattr(obj, 'attr_name'):
          # obj.attr_name exists
  • Convert string to json, only if it exists (isn't empty)
    • data = json.loads(body) if (body := event.get("body")) else body
  • Clean use of ternary for dictionary value:
    • email = os.environ.get('stg_email' if 'stg' in env else 'prod_email')
  • Concise and efficient way to get a value based on a specific input (similar to case statement):
    • type_ = query.get('type')
      type_num = {'a': 1, 'b': 2, 'c': 3}.get(type_, 0)   # Provide a default value (e.g., 0)
  • Build dictionary from list of keys:
    • def get_data_values(data):   # Dict[str, Any] -> Dict[str, Any]
          keys = ['a', 'b', 'c', 'd']
          return {
              'type': 'data',
              'id': data.get('data_id', ''),
              **{key: data.get(key, '') for key in keys}
          }
  • See list of installed and built-in modules: print(help('modules'))
  • Check if object is an instance of a specific class(es)
    • isinstance(var, str): returns true if var is a string object
    • isinstance(var, (str, int)): returns true if var is a string or an int object
Clone this wiki locally