In [8]:
import pandas as pd
import numpy as np
import csv

# Some useful stuff for copies and pastes

## Reading and Writing to csv

In [17]:
df = pd.read_csv('data/file1.csv', encoding='latin_1')

Other encodings include `utf-8`, `unicode` and many others. `latin_1` seems to work well in Windows. Especially if you have an excel sheet that you save as csv.

In [18]:
df.head(3)

Unnamed: 0,a,b,c
0,-0.403026,0.559875,0.755273
1,-1.795062,-0.195291,-0.550756
2,-2.151132,-0.915356,0.848399


In [19]:
df.to_csv('data/file2.csv', index=False, quoting=csv.QUOTE_ALL)

## Exclusive deduplication

Sometimes I need to check a list of IDs for the ones in either list that only appear in one list. Normal deduplication functions leave in one copy of the duplicate value. I want no copies of the duplicate values. I just want the values that only appear a single time in the set of values that includes both lists.

Behold. In Python, the answer is always a list comprehension.

In [2]:
def dedupe_exclusive(x, y):
    xs = [i for i in x if (i not in y)]
    ys = [i for i in y if (i not in x)]
    return xs + ys

In [3]:
x = np.arange(1, 9)
x

array([1, 2, 3, 4, 5, 6, 7, 8])

In [4]:
y = np.arange(3, 11)
y

array([ 3,  4,  5,  6,  7,  8,  9, 10])

In [5]:
dedupe_exclusive(x, y)

[1, 2, 9, 10]