## Lecture 4: Building Python Package

### What is a Python Package

**Python package:** a folder in which we have different files that correpsond to different modules of a project, each module is an ensemble of functions that serve a specific purpose; the collection of all the functions of all modules is called library.

Minimal python package structure:

Full python package structure:

### License and README

**License:** MIT license is a clear statement that the project is avaliable for every one to use (can be added when creating a Github repo, we can also add python gitignore file at the same time)

**README.md file:** explains how to use and the purpose of a package, below is a standard template for README.md file

### Core Module

- In the core module (usually named as base.py), we will have the base class and functions.
- It usually starts with some imports of external packages (the dependencies in the pyproject.toml file) that are needed for the package to work.
- *Note: we should only import what we need in each module (py file). For packages used in all modules, we can put their import in the "__init__.py" file.

A python class is an object that contains attributes:
- These include methods (functions defined in class that it can call)
- And variables defined within class

Below we define a base class for company package

In [None]:
class BaseClass:
    def __init__(self, name, ticker=None): #__init__ the method that we always call for initalisation of a class instance (object)
        self.name = name
        self.ticker = ticker
    
    def display_info(self): # we can also define other methods for other purposes
        print(f"Name: {self.name}")
        if self.ticker:
            print(f"Ticker is: {self.ticker}")

**Class inheritance:** we can define a child (or derived) class that inherits from the base class

In [None]:
# if we want the child class to be exaclty the same as the base class, having same attributes (same methods and in-class variables)
class SubClass(BaseClass):
    pass
# if we want the child class to have additional attributes
class SubClass(BaseClass):
    def __init__(self, name, specialty, manufacturer=False, ticker=None):
        super().__init__(name, ticker) #inherits BaseClass's inital attributes
        self.specialty = specialty
        self.manufacturer = manufacturer
    def display_info(self):
        super().display_info() #inherits BaseClass's display_info() method
        print(f"Specialty: {self.specialty}")
        print(f"Manufacturer: {'Yes' if self.manufacturer else 'No'}")


### Project Configuration File

The pyproject.toml is the configuration file for the package. It tells how to build the package, its dependecnies, and other metdata.

In [None]:
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm"]  # Build requirements
# setuptools for compiling package into distribution
# setuptools_scm for automatic dyanmic versioning based on git tag
# can easily install them to virtual env through pip install setuptools wheel setuptools_scm
build-backend = "setuptools.build_meta"

[project]
name = "project_name" # name here must match my_package_core name
dynamic = ["version"] # we set version as dyanmic as we are using setuptools_scm
description = "some description on package purpose"
readme = "README.md"
requires-python = ">=3.9"
license = { file = "LICENSE" }
authors = [
    { name = "Your Name", email = "your.email@example.com" }
]
keywords = ["keyword1", "keyword2", "keyword3"]
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Information Technology",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Topic :: Software Development :: Libraries"
]

# Runtime dependencies, can also specifiy outside package version here
dependencies = [
    "package1",
    "package2",
    "package3",
]

[project.urls]
"Documentation" = "https://your-readthedocs-url-here"
"Source" = "https://github.com/yourusername/companies_package"
"Issues" = "https://github.com/yourusername/companies_package/issues"


[tool.setuptools_scm]
write_to = "my_package_core/version.py"  # Where to write the dynamic version

[tool.setuptools.packages.find]
where = ["."] # search for package in the project root

**Additional note on setuptools_scm**

To setup the dynamic versioning, from inside the package folder, in a terminal, we start with:

In [None]:
git init
git add .
git commit -m "Initial commit"
git tag 0.0.0beta0

Then, if we run 'pip install -e .' or 'python -m build', setuptools_scm will read the latest Git tag and write it to the file we specified at write_to = "my_package_core/version.py"

### Package Experiment

We can install an example package called company at editiable mode:

In [None]:
# locate to the company_package folder (the package directory)
source activate_env.sh
cd ../company_package
# . means that the package is here
# -e means that install the package at editable mode, so change in the package can be reflected immediately
pip install -e . 

We can now import the company package:

In [None]:
import company as cp

Shows classes and methods avaliable in the package

In [5]:
dir(cp)

['Company',
 'MedicalCompany',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'base_company',
 'medical',
 'version']

Print cp package version 

In [18]:
cp.__version__

'0.0.post6'

Princt cp package path

In [19]:
cp.__path__

['/Users/yuhaohuo/Desktop/code/rc_class/company_package/company']

Create a company instance for Nvidia and show attributes of the new created my_company object

In [7]:
my_company = cp.Company(name="Nvidia", ticker="NVDA")
dir(my_company)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'display_info',
 'get_stock_info',
 'get_yfinance_status',
 'name',
 'summarize_activity',
 'ticker']

The my_company instance is initiated with attribute name="Nvidia"

In [9]:
my_company.name

'Nvidia'

We can also experiment with the display_info method, which prints company name and ticker of the my_company object

In [10]:
my_company.display_info()

Company Name: Nvidia
Ticker Symbol is: NVDA


Get stock history of Nvidia:

In [20]:
# we can also check avaliability of stock data
my_company.get_yfinance_status()
# if it is avaliable, we can get its stock history
stock_history = my_company.get_stock_info(period="1mo")
stock_history.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-10-15 00:00:00-04:00,184.800003,184.869995,177.289993,179.830002,214450500,0.0,0.0
2025-10-16 00:00:00-04:00,182.229996,183.279999,179.770004,181.809998,179723300,0.0,0.0
2025-10-17 00:00:00-04:00,180.179993,184.100006,179.75,183.220001,173135200,0.0,0.0
2025-10-20 00:00:00-04:00,183.130005,185.199997,181.729996,182.639999,128544700,0.0,0.0
2025-10-21 00:00:00-04:00,182.789993,182.789993,179.800003,181.160004,124240200,0.0,0.0


We may want to add a subclass MedicalCompany in the medical submodule, to do so, we need to add the below lines to the medical.py file at medical dir.

In [None]:
from ..base_company import Company #we import the base class
# Here, with .. we go up one level in the folder structure and ask to import the Company class that is defined in the base_company.py file. This is an example of a relative import.

class MedicalCompany(Company):
    def __init__(self, name, specialty, drug_manufacturer=False, ticker=None):
        super().__init__(name, ticker)
        self.specialty = specialty
        self.drug_manufacturer = drug_manufacturer

    def display_info(self):
        """Displays basic information about the medical company."""
        super().display_info()
        print(f"Medical Specialty: {self.specialty}")
        print(f"Drug Manufacturer: {'Yes' if self.drug_manufacturer else 'No'}")


Then we can use the MedicalCompany class in medical submodule:

In [14]:
import company as cp

med_comp = cp.MedicalCompany(name="HealthCare Inc.", specialty="Oncology", drug_manufacturer=True, ticker="HCI")
med_comp.display_info()

Company Name: HealthCare Inc.
Ticker Symbol is: HCI
Medical Specialty: Oncology
Drug Manufacturer: Yes


**Adding package data:** It can be useful to store data in your package. However, in general, it is advised to not include data files in your package. To achieve this, we create a data folder in the core package folder, and put the data there.

In [None]:
# In the python package, there is a dataset with drug approval data drug_data.csv.
# This file is in company_package/company/data/drug_data.csv.
company_package/
├── README.md
├── company
│   ├── __init__.py
│   ├── base_company.py
│   ├── version.py
│   ├── medical
│   │   ├── __init__.py
│   │   └── medical.py
│   └── data
│       └── drug_data.csv
...

In [None]:
# Project configuration pyproject.toml should record it:
[tool.setuptools.package-data]
"company" = ["data/*"]
# This tells setuptools to include the data in the package distribution, and that the data is in the company folder

In [None]:
# the method defined for MedicalCompany class then can access this drug data file
# note that we can use the files function from the importlib.resources package to get the path to the data file automatically
from importlib.resources import files
dataset_path = files("company").joinpath("data/drug_data.csv") 
# importlib is a Python standard-library module used to access files that are packaged inside a Python package
# files method returns the root directory of a given package

In [17]:
# experiments with drug_approval_summary() methods which uses this durg data file
import company as cp
med_comp = cp.MedicalCompany(name="PharmaCorp", specialty="Oncology", drug_manufacturer=True)
med_comp.drug_approval_summary()


Drug Approval Summary for PharmaCorp:
 - DrugA: 2 failed attempt(s) before approval
 - DrugB: 1 failed attempt(s) before approval
 - DrugE: 0 failed attempt(s) before approval
 - DrugF: 4 failed attempt(s) before approval


### Parameters Passing for Class Object:

- For every method of a class, **self** is a paramter that must be passed. The other parameters can be passsed to set the attributes of the instance.
- If a parameter is presented with an equal sign and a default value, it is called an **optional parameter**.
- Otherwise, it is a **required parameter**.

It is also common to use two additional parameters objects when defing methods: *args and **kwargs.
- *args is used to pass a variable number of positional arguments to the constructor.
- **kwargs is used to pass a variable number of keyword arguments to the constructor.

Below is an exmaple of how *args and **kwargs work:

In [None]:
class Company:

    ...

    def summarize_activity(self, *args, **kwargs):
        """
        Summarizes company activities and additional information.

        Parameters:
        - *args: A list of activities related to the company.
        - **kwargs: Additional information, like location or date.
        """
        print(f"\nActivity Summary for {self.name}:")

        if args:
            print("Activities:")
            for activity in args: #args can be treated as a list of parameters
                print(f" - {activity}")

        if kwargs:
            print("Additional Information:")
            for key, value in kwargs.items(): #kwars can be viewed as a dictionary of key-value pairs
                print(f" - {key.capitalize()}: {value}")

- In this example, we pass two strings as positional arguments, which are stored in the args as a list.
- We also pass two key-value paris as keyword arguments, which are stored in kwargs as a dictionary 
- Note that keyword arguments must come after the positional arguments.

In [None]:
company.summarize_activity(
    "Researching new drugs", "Launching a public health campaign",
    location="New York", date="2024-10-27"
)

### Turning methods into commands

We might want to turn some methods in our package into commands that can be run from the command line. To do so, we need to create a cli.py file in the core package folder:

In [None]:
company_package/
├── README.md
├── company
│   ├── __init__.py
│   ├── base_company.py
│   ├── version.py
│   ├── medical
│   │   ├── __init__.py
│   │   └── medical.py
│   └── cli.py
...

Inside the cli.py file, we will have code like below:

In [None]:
import argparse
from my_package_name import core_class

##### here we first pin down what command we want to call in the command line

# the first command is a function that uses method 1 from the core_class of our package
def function1(input1):
    instance = core_class(attr_1=input1)
    instance.method_1()

# the second command is a function that uses method 2 from the core_class of our package
def function2(input1, input2, input3):
    instance = core_class(attr_1=input1)
    instance.method_2(param_1 = input2, param_2 = input3)

##### end of first part



##### now we set up the command-line interface

def main():
    # build the command-line interface for the package
    parser = argparse.ArgumentParser(description="<my_package_name> CLI Tool")
    subparsers = parser.add_subparsers(dest="command")

    # command1 (for function1)
    parser_function1 = subparsers.add_parser("function1", help="<what is the purpose of function1>")
    parser_function1.add_argument("--input1", type=str, required=True, help="<what this input is about>")

    # command2 (for function2)
    parser_function2 = subparsers.add_parser("function2", help="<what is the purpose of function2>")
    parser_function2.add_argument("--input1", type=str, required=True, help="<what this input is about>")
    parser_function2.add_argument("--input2", type=int, required=True, help="<what this input is about>")
    parser_function2.add_argument("--input3", type=float, required=True, help="<what this input is about>")

    # decide which command (function) to execute when getting some input from the command line
    args = parser.parse_args()
    if args.command == "function1":
        parser_function1(args.input1)
    if args.command == "function2":
        parser_function2(args.input1, args.input2, args.input3)

# just ensure the CLI runs
if __name__ == "__main__":
    main()

Then we need to tell the package that it needs to create console commands for the package from the methods in the cli.py file. We do it in the pyproject.toml file, by adding:

In [None]:
[project.scripts]
my_package_name = "my_package_name.cli:main"

Now, after pip install our package, we can run in bash:

In [None]:
my_package_name function1 --input1==<string-I-want-to-input>

In [None]:
my_package_name function2 --input1 <string-I-want-to-input> --input2 <int-I-want-to-input> --input3 <float-I-want-to-input>

We can also ask for help on the commands by running:

In [None]:
my_package_name --help

And further details on a specific command by running:

In [None]:
my_package_name function1 --help

### Naming Conventions

The style guide for Python code is called PEP 8 (PEP means Python Enhancement Proposal)

**Package and Module (File and Folder) Names:** 
- Convention: Use lowercase letters. You can use underscores (_) when necessary.
- Reason: Keeps names concise and readable, and avoids naming conflicts.

**Classes:**
- Convention: Use PascalCase (like BaseClass).
- Reason: Easily distinguish classes from variables or functions/methods.

**Methods and Variables:**
- Convention: Use snake_case (all lowercase with underscores between words).
- Such as: my_method, variable_one

**Additional Notes:** 
- Constants should use UPPERCASE_WITH_UNDERSCORES.
- Private or “internal use only” variables/methods should begin with a single underscore, like _private_method .

Private variables and methods refer to those that should not be used/accessed outside the class. In below exmaple, _private_multiply is a private method (it should only be used within class) and _factor is a private attribute (it is already defined and should not be modified).

In [None]:
class Calculator:
    def __init__(self):
        self._factor = 2  # Private attribute for internal use

    def multiply(self, number):
        """Public method to multiply a number by the private factor."""
        return self._private_multiply(number, self._factor)

    def _private_multiply(self, num1, num2):
        """Private method to perform multiplication."""
        return num1 * num2

# Usage example
calc = Calculator()

# Using the public method (preferred)
result = calc.multiply(5)
print(result)  # Output: 10

# Accessing the private method directly (discouraged but possible)
direct_result = calc._private_multiply(5, 3)
print(direct_result)  # Output: 15