# Pycheat - Python Cheatsheet

When not specified: Python3  
Link in markdown: \[blue_text](url_here)  
[Help for markdown](https://commonmark.org/help/tutorial/index.html)  
[Built-in magic Jupyter commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [7]:
%%latex
\begin{equation}
H← ​​​60 ​+​ \frac{​​30(B−R)​​}{Vmax−Vmin}  ​​, if V​max​​ = G
\end{equation}

<IPython.core.display.Latex object>

In [1]:
To run a bash command simple use the exclamation mark at the beginning of a line
!echo test

SyntaxError: invalid syntax (<ipython-input-1-da77e709f55d>, line 1)

## Naming convention
[difference between module/class/package](https://softwareengineering.stackexchange.com/a/111882/195918)

[PEP 0008](https://www.python.org/dev/peps/pep-0008/#package-and-module-names) tells that:

- **modules (filenames)**: should have short, all-lowercase names, and they can contain underscores;
- **packages (directories)**: should have short, all-lowercase names, preferably without underscores;
- **classes**: should use the CapWords convention.

    

## Misc

In [12]:
# Computing time for a line
%timeit [i for i in range(100000)]

12.9 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [2]:
%%timeit # must be at the top of the cell => -r 1 to run only one time

# Computing time for a cell
for i in range(1000):
    i

36.3 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10000 loops each)


## Lists

In [None]:
xs = [1,2,3,4,5]
ys = [6,7,8,9,10]

In [25]:
xs + ys # List concatenation

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## Dictionnaries

In [11]:
from collections import Counter
# Counter
c = Counter(['eggs', 'ham', 'eggs'])
c.update(['eggs', 'chesse'])
print(c)
print(c.most_common(2))

Counter({'eggs': 3, 'ham': 1, 'chesse': 1})
[('eggs', 3), ('ham', 1)]


## String formatting

[pyformat.info](https://pyformat.info/)

In [2]:
'{1} {0}'.format('one', 'two')

'two one'

In [7]:
'{{ {} {} }}'.format('one', 2)

'{ one 2 }'

In [6]:
multiline_string = "a first line"  \
                   "a second line"

In [5]:
# To keep the zeroes at the end
print('{:.2f}'.format(round(2606.89579999999, 2)))
print('{:.2f}'.format(21))

2606.90
21.00


## Loops

In [2]:
some_list = ["bananas", "apples", "mangos"]

In [2]:
for index, value in enumerate(some_list):
    print(value + " is at index " + str(index))

bananas is at index 0
apples is at index 1
mangos is at index 2


In [3]:
some_dict = {'three': 3, 'one': 1, 'two': 2}

In [5]:
for key, value in some_dict.items(): # Python 2.7 : iteritems()
    print(key + " is " + str(value))

three is 3
one is 1
two is 2


In [5]:
# Filter elements in comprehension list
[x for x in some_list if x != 'bananas']

['apples', 'mangos']

## Files

In [29]:
import os
# Check file existance
os.path.exists('./file_or_link_or_dir_or_sym')
os.path.isdir('./folder/')
os.path.isfile('./file')

False

In [39]:
os.listdir('.')
#os.remove("dir_or_file_or_etc")

['python_cheatsheet.ipynb', '__main__.log', '.ipynb_checkpoints']

In [None]:
import shutil
import os
def copytree(src, dst, symlinks=False, ignore=None):
    for item in os.listdir(src):
        s = os.path.join(src, item)
        d = os.path.join(dst, item)
        if os.path.isdir(s):
            shutil.copytree(s, d, symlinks, ignore)
        else:
            shutil.copy2(s, d)


In [6]:
# source: https://gist.github.com/seanh/93666
def format_filename(s):
    """Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
removed. Also spaces are replaced with underscores.
 
Note: this method may produce invalid filenames such as ``, `.` or `..`
When I use this method I prepend a date string like '2009_01_15_19_46_32_'
and append a file extension like '.txt', so I avoid the potential of using
an invalid filename.
 
"""
    valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
    filename = ''.join(c for c in s if c in valid_chars)
    filename = filename.replace(' ','_') # I don't like spaces in filenames.
    return filename

## Functions

In [34]:
def a_function():
    print("myfunction")
 
if __name__ == '__main__': # If module (file) imported this part of the code will not be executed
    a_function()

myfunction


In [38]:
def multi_parameters_func(*params):
    print("params : {}".format(', '.join(params)))
multi_parameters_func("first_param", "second_param")
# other example: see benchmark function below ;)

params : first_param, second_param


## Functionnal

[reduce_fold_left_python_haskell](https://eli.thegreenplace.net/2017/right-and-left-folds-primitive-recursion-patterns-in-python-and-haskell/)

In [19]:
xs = [1,2,3,4,5]
ys = [6,7,8,9,10]

In [20]:
 list(map(lambda a: a*10, xs))

[10, 20, 30, 40, 50]

In [21]:
# map with 2 arguments
list(map(lambda x,y: x+y, xs,ys))

[7, 9, 11, 13, 15]

In [22]:
list(filter(lambda x: x%2==0,xs))

[2, 4]

In [24]:
import functools
functools.reduce(lambda acc, x: acc+x, xs, 0) # ((((1+2)+3)+4)+5)  == Fold left in Haskell 

15

In [2]:
# Partial functions (e.g. useful to use with map when needing to pass a param)
import functools
def multiply(x,y):
        return x * y

dbl = functools.partial(multiply,2) # create a new function that multiplies by 2
print(dbl(4))

8


## Logs

In [10]:
import logging

def create_logger(loglevel):
    numeric_level = getattr(logging, loglevel.upper(), logging.INFO)
    if not isinstance(numeric_level, int):
        raise ValueError('Invalid log level: %s' % loglevel)

    #import sys
    #import os
    #module_name = str(os.path.basename(sys.modules['__main__'].__file__)).split('.')[0]
    module_name = __name__
    
    logger = logging.getLogger(module_name)
    logger.setLevel(numeric_level)
    # create file handler which logs even debug messages
    fh = logging.FileHandler(module_name + '.log')
    fh.setLevel(logging.DEBUG)
    # create console handler with a higher log level
    ch = logging.StreamHandler()
    ch.setLevel(logging.INFO)
    # create formatter and add it to the handlers
    formatter = logging.Formatter('%(asctime)s\t%(name)s\t%(levelname)s\t\t%(message)s')
    fh.setFormatter(formatter)
    ch.setFormatter(formatter)
    # add the handlers to the logger
    logger.addHandler(fh)
    logger.addHandler(ch)
    logger.info("Logger created!")
    return logger

logger = create_logger("info")
logger.info("This is a log")

2018-07-23 15:30:33,168	__main__	INFO		Logger created!
2018-07-23 15:30:33,172	__main__	INFO		This is a log


## Various tools

In [13]:
def benchmark(func, *params):
    import datetime
    import time
    start_time = time.time()
    return_value = func(*params) if params else func()
    total_time = datetime.timedelta(seconds=time.time() - start_time)
    print("Function " + func.__name__ + " - execution time : " + str(total_time))#.strftime('%H:%M:%S'))
    return return_value

def test():
    total = 0
    for i in range(0, 10000):
        total +=i
    return total

def sum(param1, param2):
    return param1 + param2

result = benchmark(sum, 1, 2)
print("Result : " + str(result))

result = benchmark(test)
print("Result : " + str(result))

Function sum - execution time : 0:00:00.000002
Result : 3
Function test - execution time : 0:00:00.000820
Result : 49995000


In [42]:
import math
def entropy(string):
    "Calculates the Shannon entropy of a string"

    # get probability of chars in string
    prob = [float(string.count(c)) / len(string) for c in dict.fromkeys(list(string))]

    # calculate the entropy
    entropy = - sum([p * math.log(p) / math.log(2.0) for p in prob])

    return entropy

print(entropy("www.google.com"))

2.8423709931771093


In [48]:
def is_ipv4(ipv4_string):
    l = ipv4_string.split('.')
    if len(l) != 4:
        return False
    try:
        ip = list(map(int, l))
    except ValueError:
        return False
    if len(list(filter(lambda x: 0 <= x <= 255, ip))) == 4:
        return True
    return False

# True
print(is_ipv4("192.168.1.1"))
print(is_ipv4("0.0.0.0"))
print(is_ipv4("255.255.255.255"))

# False
print(is_ipv4("255.255.255"))
print(is_ipv4("255.255.255.255.3"))
print(is_ipv4("255.255.255.erzr"))


True
True
True
False
False
False


# Pandas dataframes

[Pandas tips and tricks](https://towardsdatascience.com/pandas-tips-and-tricks-33bcc8a40bb9)

In [None]:
import pandas as pd

df = pd.DataFrame(data, columns=features_name)
    
import csv
df.to_csv(c.model_folder + filename, sep=',', encoding='utf-8', index=False, quoting=csv.QUOTE_NONNUMERIC)

#df = pd.read_csv(c.model_folder + "features.csv")

# Print a summary of 5 first rows
df.head(5)

# check the data frame info
df.info()

# Get unique values of a column
df['continent'].unique().tolist()

# To add a new column and set all rows to a specific value
df['Name'] = 'abc'

# Using DataFrame.drop
df.drop(df.columns[[1, 2]], axis=1, inplace=True)

# drop by Name
df1 = df1.drop(['B', 'C'], axis=1)

# Select the ones you want
df1 = df[['a','d']]

column_names = df.index
data = df.values

# Where condition
df.loc[df['label'] == 'NORMAL']

# Select column(s)
df[f_name]

# Count number of != values
df[f_name].value_counts()

# Put a column at the end
df_label = df.pop('label') # remove column 'label' and store it in df_label
df['label'] = df_label # add label as a 'new' column.

# To modify a specific cell
df.loc[df['key'] == 'mykey', "column_name"] = 1


# To iter over rows
for index, row in df.iterrows(): # row is a copy of the row from the dataframe
    print row['c1'], row['c2']
    
    # To modify the row: use the index to access the row in the dataframe
    df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
    
# To sum rows
df = df.sum(axis=1)

# To sum columns
normal_counts = df.["col_name"].sum()

## OS

In [None]:
import subprocess
# Run command 
subprocess.Popen(["bro", "-C", "-r", "../"+filename], cwd=working_dir).wait()

## Web requests (requests & BeautifulSoup)
- [requests doc](http://docs.python-requests.org/en/master/)
- [BeautifulSoup doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

In [3]:
import requests
from bs4 import BeautifulSoup

In [4]:
url = "http://www.test.com"
cookies = {
    'Cookie':'some_value',
}
r = requests.get(url, cookies=cookies)

#print(r.text)

soup = BeautifulSoup(r.content, "html5lib")

In [5]:
#Scan the URLs present on the page
for link in soup.find_all('a'):
    href = link.get('href')
    if str(href).startswith("http"): # To exclude refs that are links to paragraphs on the page (like #maincontent)
        print(href)

In [1]:
def is_downloadable(url):
    """
    Does the url contain a downloadable resource
    """
    h = requests.head(url, allow_redirects=True, cookies=cookies)
    header = h.headers
    content_type = header.get('content-type')
    if 'text' in content_type.lower():
        return False
    if 'html' in content_type.lower():
        return False
    return True

In [None]:
# Decode URL
from urllib.parse import unquote
url = unquote(url)