# Python data science idioms

- hide: true
- sticky_rank: 1
- toc: true
- comments: true
- categories: [python]

A repository for best practice solutions to things I find myself doing over and over again.

In [7]:
import os

'NSPL_AUG_2020_UK.csv'.lower()

'nspl_aug_2020_uk.csv'

# Data sciencebasename

In [1]:
from imports import *

%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

## Preprocessing pipeline

Each dataset is unique, but the steps to clean it are not. Here I document a series steps I use to clean each and ensure the quality of each dataset I work with.

In [1]:
# read data (from disk or s3)
# clean (standardise varnames, drop unneeded vars, etc.)
# reshape
# ...
# test
# write to disk or s3

### Reorder dataframe columns

In [None]:
first = ['date', 'yw', 'pcsector']
rest = set(df.columns) - set(first)
df = df[first + list(rest)]

### Select a subset of users from a panel dataset based on user-level criteria

- I have a folder of files which I want to clean and append. How to do this?
- I want to convert a df column to datatime, how to do this (at read, using np, using Pandas)? What are tradeoffs?

# General Python

## Parsing command line arguments

Rationale:

- Ensures that `sys.argv`, which is not immutable, doesn't get altered between command invocation and execution of `main`.

In [3]:
import argparse


def parse_args(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument('argname')
    parser.add_argument('--test', action='store_true',
                        help='Read subset of files only.')
    parser.add_argument('--nrows', type=int,
                        help='Number of rows to read per file.')
    return parser.parse_args()


def main(args=None):
    if args is None:
        args = sys.argv[1:]
    print(f'argname is {args.argname}')

## Use `ast.literal_eval() ` instead of `eval()`

Basically, because `eval` is very [dangerous](https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html) and would happile evaluate a string like `os.system(rm -rf /)`, while `ast.literal_eval` will only evaluate Python [literals](https://docs.python.org/3/library/ast.html#ast.literal_eval).

## Sources

- [Fluent Python](https://www.oreilly.com/library/view/fluent-python/9781491946237/)
- [Python Cookbook](https://www.oreilly.com/library/view/python-cookbook-3rd/9781449357337/)
- [Learning Python](https://www.oreilly.com/library/view/learning-python-5th/9781449355722/)
- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/structure/)
- [Effective Python](https://effectivepython.com)
- [Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)
- [Python Data Science Handbook](https://www.oreilly.com/library/view/python-data-science/9781491912126/)