# Python Basics

Setup Jupyter Lab envirnoment.

**environment.yml**

```{yml}
name: jup_env
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python=3.8
  - pandas
  - jupyterlab
```

In bash:

```{bash}
conda env create -f environment.yml
conda activate jup_env
jupyter lab      # Starts browser notebook
```

In [1]:
print("Hello World")

Hello World


## Data Types

Data has to be stored in variables. Programming is basically moving and modifying variables.


In [2]:
i = 5            # Integers 1, 2, 3, ...
f = 1.1          # Floats 1.1, 1.1111, 3.14
d = 1/3          # Doubles  1.3333333....
s = "hello"      # Strings
print(i) 
print(f)
print(d)
print(s)

5
1.1
0.3333333333333333
hello


List all data objects

In [3]:
whos

Variable   Type     Data/Info
-----------------------------
d          float    0.3333333333333333
f          float    1.1
i          int      5
s          str      hello


## Reading and Writing Files

You will probably need to read or write files, probably a csv/tsv/txt. Use the `pandas` library to crate a data.frame.


In [4]:
import pandas as pd
data = pd.read_csv("data/small.txt")  # read from a file
data.to_csv('data/new_small.csv')     # write to a file
data

Unnamed: 0,color,count
0,red,11
1,white,22
2,blue,33
3,green,44
4,purple,55


The `data` object is a data frame, somewhat similar to R's data.frame.

In [5]:
type(data) 

pandas.core.frame.DataFrame

In [6]:
data.columns    # equivalent to R's names(data)


Index(['color', 'count'], dtype='object')

In [7]:
whos

Variable   Type         Data/Info
---------------------------------
d          float        0.3333333333333333
data       DataFrame        color  count\n0     r<...>     44\n4  purple     55
f          float        1.1
i          int          5
pd         module       <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
s          str          hello


# Define your own python functions

Use the `def` keyword to create a function that takes input and gives output. Use functions to modify, combine, and compute on variables.

In [8]:
def hello_function(name):
    print("Hello ", name, "!")
    
hello_function("Mark")

Hello  Mark !


# Define your own python classes

The key idea of object-oriented programming is grouping data and methods in a data structure called a class. The class is like a blueprint for a house (before the house is built or before the data is assigned any value). The blueprint allocates memory on the computer (you don't need to know much about this). As soon as you create an object from the class (assign values to memory), then it's called an object.

Use the `class` keyword to create a class with internal data and functions (also referred to as methods):

In [9]:
class MyPerson:
    
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def getAge(self):
        return("I am " + str(self.age) + " old.")

In [10]:
mark = MyPerson("Mark", 40)         # Instantiate a class to create an object
mark.age                            # Access object data
mark.getAge()                       # Call a function of the object

'I am 40 old.'

Python is an object-oriented language, so the general analysis pipeline is (1) import packages (2) creating objects and (3) calling object.functions(). 

Use `dir()` to view all attributes (internal data) and methods of an object. 

In [11]:
dir(mark)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'age',
 'getAge',
 'name']

Any attributes or methods that start with an underscore or private (or internal), do not use these. Use the other attributes ("age", "getAge", "name").

## Case Study: Gapminder

[add description of dataset here with link to sources]

In [12]:
data = pd.read_csv("data/gapminder_all.csv", index_col=0 )

# data[1:10, 1:5]       # <= will NOT work!
data.iloc[1:10,1:5]     # iloc = integer location

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,3520.610273,3827.940465,4269.276742,5522.776375
Africa,1062.7522,959.60108,949.499064,1035.831411
Africa,851.241141,918.232535,983.653976,1214.709294
Africa,543.255241,617.183465,722.512021,794.82656
Africa,339.296459,379.564628,355.203227,412.977514
Africa,1172.667655,1313.048099,1399.607441,1508.453148
Africa,1071.310713,1190.844328,1193.068753,1136.056615
Africa,1178.665927,1308.495577,1389.817618,1196.810565
Africa,1102.990936,1211.148548,1406.648278,1876.029643


In [13]:
data.loc[:,"country":"gdpPercap_1962"].head()   # loc = label based location

Unnamed: 0_level_0,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,Algeria,2449.008185,3013.976023,2550.81688
Africa,Angola,3520.610273,3827.940465,4269.276742
Africa,Benin,1062.7522,959.60108,949.499064
Africa,Botswana,851.241141,918.232535,983.653976
Africa,Burkina Faso,543.255241,617.183465,722.512021


## Tidyverse flavored Python (I'll fix it later)

| R tidyverse command | Python command |
|:--|:--|
|group_by | groupby |
|summarize(avg = mean(col1)) | mean() agg()|
|mutate(newcol=col1+col2) | assign(newcol = lambda data.frame ...|
|pivot_longer | is there a melt equivalent? for merging datasets? |
| ungroup | .. not needed?|
| subset(col > 80) | rslt_df = dataframe.loc[dataframe['col'] > 80] |
| select(c(col1, col2)) | dataframe.loc['col1':'col2']



In [14]:
data = pd.read_csv("data/gapminder_all.csv", index_col=0)
summarydata = data.groupby("country").mean().head()
summarydata

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,...,pop_1962,pop_1967,pop_1972,pop_1977,pop_1982,pop_1987,pop_1992,pop_1997,pop_2002,pop_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,779.445314,820.85303,853.10071,836.197138,739.981106,786.11336,978.011439,852.395945,649.341395,635.341351,...,10267083.0,11537966.0,13079460.0,14880372.0,12881816.0,13867957.0,16317921.0,22227415.0,25268405,31889923
Albania,1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.00391,3630.880722,3738.932735,2497.437901,3193.054604,...,1728137.0,1984060.0,2263554.0,2509048.0,2780097.0,3075321.0,3326498.0,3428038.0,3508512,3600523
Algeria,2449.008185,3013.976023,2550.81688,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,...,11000948.0,12760499.0,14760787.0,17152804.0,20033753.0,23254956.0,26298373.0,29072015.0,31287142,33333216
Angola,3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,...,4826015.0,5247469.0,5894858.0,6162675.0,7016384.0,7874230.0,8735988.0,9875024.0,10866106,12420476
Argentina,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,...,21283783.0,22934225.0,24779799.0,26983828.0,29341374.0,31620918.0,33958947.0,36203463.0,38331121,40301927
