## General form of list comprehension
f(x) for x in list

since dataframe columns are series, a list comprehension can be applied to them

f(x) for x in df['y']

if you rou need to pass multiple columns as parameters to the function, this can be done using zip

f(a,b) for x(a,b) in zip(df['a'],df['b'])

generating a new column in a dataframe using list comprehension would look something like this:

df['new_column'] = f(a,b) for row(a,b) in zip(df['a'],df['b'])



## List comprehension to fix formatting problems with column names


In [None]:
# badly formatted columns names are an extremely common problem in data sets
# here's a robust way to handle this using a list and list comprehension

# first make a list of your column names
cols = list(df.columns)

# run a comprenehnsion on the list to clean up the formatting
cols  = [x.lower().strip() for x in cols]

#replace the column names in the dataframe
df.columns = cols

df

## Replacing specific column names

In [None]:
# the columns can be renamed with a dictionary of old column name : new column name passed in to the rename function
new_df = df.rename (columns = { 'SOP':'Statement of Purpose',
                                'LOR':'Letter of Recommendation'})

## showing just a few columns from a large dataframe

In [None]:
# To, for example, show the first 3 rows and the first 6 columns of a dataframe you could use this command. 
df.iloc[:3, :6]
# Obviously you could substitute any range for [row(s), column(s)]

#the following would show 4 random columns from the dataset
df.sample(n=4, axis=1)
# and this would show 4 random rows
df.sample(n=4, axis=0)

## some sqlalchemy tricks

In [None]:
# assuming a reflected database, you can get a list of column names like so:
Base.metadata.tables['SurveySession'].columns.keys()
