---
title: Tools for Best Python Practices
jupyter:
  jupytext:
    formats: ipynb,qmd
    text_representation:
      extension: .qmd
      format_name: quarto
      format_version: '1.0'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

In [None]:
!pip install hydra-core

When writing code, it is a good practice to put the values that you might change in a separate file from your original script.

This practice not only saves you from wasting time searching for a specific variable in your scripts but also makes your scripts more reproducible.

My favorite tool to handle config files is Hydra. The code below shows how to get values from a config file using Hydra.

All parameters are specified in a configuration file named `config.yaml`: 

```yaml
# config.yaml
data: data1 
variables: 
  drop_features: ['iid', 'id', 'idg', 'wave']
  categorical_vars: ['undergra', 'zipcode']
 ```

In seperate file named `main.py`, the parameters in the `config.yaml` file are called using Hydra:
```python
# main.py
import hydra 

@hydra.main(config_name='config.yaml')
def main(config):
    print(f'Process {config.data}')
    print(f'Drop features: {config.variables.drop_features}')

if __name__ == '__main__':
    main()
```

On your terminal, type:
```bash
$ python main.py
```
Output:

In [14]:
!python hydra_examples/main.py

config_path is not specified in @hydra.main().
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_hydra_main_config_path for more information.
  @hydra.main(config_name='config.yaml')
Process data1
Drop features: ['iid', 'id', 'idg', 'wave']


[Link to my article about Hydra](https://towardsdatascience.com/introduction-to-hydra-cc-a-powerful-framework-to-configure-your-data-science-projects-ed65713a53c6?sk=eb08126922cc54a40c2fdfaea54c708d).

[Link to Hydra](https://hydra.cc/). 


### Store Sensitive Information Securely in Python with .env Files

In [None]:
!pip install python-dotenv

Managing configuration and sensitive data in code results in security risks and deployment challenges as values are hard-coded or need to be manually set in different environments. This causes maintenance overhead and potential security breaches.

In [None]:
PASSWORD=123
USERNAME=myusername

Python-dotenv lets you separate configuration from code by loading environment variables from a `.env` file. You can:

- Keep sensitive data out of code
- Use different configurations per environment

Here is an example:

In [1]:
%%writefile .env
PASSWORD=123
USERNAME=myusername

Overwriting .env


In [3]:
from dotenv import load_dotenv
import os 

load_dotenv()
PASSWORD = os.getenv('PASSWORD')
USERNAME = os.getenv('USERNAME')
print(PASSWORD)
print(USERNAME)

123
myusername


[Link to python-dotenv](https://github.com/theskumar/python-dotenv)

### Type-Safe Configuration Management with pydantic-settings

In [None]:
!pip install "pydantic-settings"

Managing configuration settings without proper validation can lead to runtime errors and type-related issues. Consider this problematic approach:

In [10]:
%env DATABASE_URL=postgresql://localhost:5432/db
%env MAX_CONNECTIONS=10
%env DEBUG=False

env: DATABASE_URL=postgresql://localhost:5432/db
env: MAX_CONNECTIONS=10
env: DEBUG=False


In [11]:
import os

DATABASE_URL = os.getenv('DATABASE_URL', 'postgresql://localhost:5432/db')
MAX_CONNECTIONS = os.getenv('MAX_CONNECTIONS', '10')  # Need manual conversion to string
DEBUG = bool(os.getenv('DEBUG', 'False'))  # Need manual conversion to boolean

print(f"Database URL: {DATABASE_URL}")
print(f"Max Connections: {MAX_CONNECTIONS}")
print(f"Debug Mode: {DEBUG}")

Database URL: postgresql://localhost:5432/db
Max Connections: 10
Debug Mode: True


Pydantic-settings provides type-safe configuration management with automatic validation. Here's how to use it:

Define your settings with type hints:

In [5]:
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import PostgresDsn
from typing import Optional

class DatabaseSettings(BaseSettings):
    model_config = SettingsConfigDict(env_prefix='DB_')
    
    url: PostgresDsn = PostgresDsn('postgresql://localhost:5432/db')
    max_connections: int = 10
    debug: bool = False

Set environement variables:

In [8]:
%env DB_URL=postgresql://localhost:5432/production
%env DB_MAX_CONNECTIONS=20
%env DB_DEBUG=True

env: DB_URL=postgresql://localhost:5432/production
env: DB_MAX_CONNECTIONS=20
env: DB_DEBUG=True


Load settings from environment variables:

In [12]:
settings = DatabaseSettings()

# Access validated settings
print(f"Database URL: {settings.url}")
print(f"Max Connections: {settings.max_connections}")
print(f"Debug Mode: {settings.debug}")

Database URL: postgresql://localhost:5432/production
Max Connections: 20
Debug Mode: True


Compare with Python-dotenv:

In [16]:
from dotenv import load_dotenv
import os

load_dotenv()
database_url = os.getenv('DATABASE_URL')  # No type validation
debug = bool(os.getenv('DEBUG'))  # Returns string, manual conversion needed

[Link to pydantic-settings](https://github.com/pydantic/pydantic-settings)

### docopt: Create Beautiful Command-line Interfaces for Documentation in Python

In [None]:
!pip install docopt 

Writing documentation for your Python script helps others understand how to use your script. However, instead of making them spend some time to find the documentation in your script, wouldn’t it be nice if they can view the documentation in the terminal?

That is when docopt comes in handy. docopt allows you to create beautiful command-line interfaces by passing a Python string. 

To understand how docopt works, we can add a docstring at the beginning of the file named `docopt_example.py`. 

In [None]:
%%writefile docopt_example.py
"""Extract keywords of an input file
Usage:
    docopt_example.py --data-dir=<data-directory> [--input-path=<path>]
Options:
    --data-dir=<path>    Directory of the data
    --input-path=<path>  Name of the input file [default: input_text.txt]
"""

from docopt import docopt 

if __name__ == '__main__':
    args = docopt(__doc__, argv=None, help=True)
    data_dir = args['--data-dir']
    input_path = args['--input-path']

    if data_dir:
        print(f"Extracting keywords from {data_dir}/{input_path}")

Running the file `docopt_example.py` should give us the output like below:

```bash
$ python docopt_example.py
```

In [23]:
!python docopt_example.py

Usage:
    docopt_example.py --data-dir=<data-directory> [--input-path=<path>]
Options:
    --data-dir=<path>    Directory of the data
    --input-path=<path>  Name of the input file [default: input_text.txt]


[Link to docopt](http://docopt.org/).