In [1]:
with open('01_data_manipulation_and_representation.ipynb','r') as IN :
    for l in IN:
        print(l , end='')

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# importing, manipulating, and representing data \n",
    "\n",
    "The basis of any statistical analysis is the underlying data.\n",
    "\n",
    "A data-set is typically presented as a file containing information formatted as a table:\n",
    " * each line correspond to an observation ( individual, sample, ... )\n",
    " * each column correspond to a measured variable ( height, sex, gene expression, ... )\n",
    "\n",
    "\n",
    "To read data file and manipulate the data, we will rely on [pandas](https://pandas.pydata.org/)\n",
    "Pandas is a \"high-level\" module, designed for statistics/exploratory analysis.\n",
    "A great strength of pandas is its **DataFrame** which emulates many of the convenient behavior and syntax of their eponym counterpart in the **R** language.\n",
    "\n",
    "\n",
    "To graphically represent the data, we will rely on [seaborn](https://seaborn.pydata.org/ind

In [52]:
def isTitle( n , numbers = False , exercises=False ):
    x = n.startswith('#')
    if not x : 
        return x
    y = True

    if numbers :
        y = n.rpartition('#')[2].strip()[0].isdigit()
        if (not y) and exercises :
            y = n.rpartition('#')[2].lower().strip().startswith("exercise")
    return x and y

def genTocCell( items ,titleLevels):
    toc = {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
    }
    toc['source'].append( "# Table of Content <a id='toc'></a>\n" )
    
    for i, item in enumerate(items):
        toc['source'].append( "\n\n" )
        toc['source'].append( titleLevels[i]*4*'&nbsp;' +  "[{}](#{})".format(item,i) )

    return toc
    

In [60]:

FN = '04_correlation_and_regression.ipynb'
FN_out = FN.rpartition('.')[0] + '.withTOC.ipynb'

import json
   
with open(FN,'r') as IN :
    NB = json.load(IN)

backToToc = ['\n','[back to the toc](#toc)\n','\n','<br>\n','\n']

titleList = []
titleLevels = []

for cell in NB['cells']:
    if cell['cell_type'] == "markdown" :
        titles = {} # k : line number in cell, v: id of title
        titlesL = []
        for i,l in enumerate( cell['source'] ):
            if isTitle( l , numbers=True , exercises = True ): # keep numbered titles or starting with "Exercise"
                
                if l.endswith("</a>\n") : # there is already an anchor
                    l = l.partition('<a id=')[0] # remove it
                    cell['source'][i] = l 
                
                titles[i] = len(titleList)
                titlesL.append(i)
                titleList.append( l.strip('#').strip() )
                titleLevels.append( len( l.partition(' ')[0] ) )

        for i in titlesL[::-1]:
            
            # adding an id to the title     
            cell['source'][i] = cell['source'][i].strip('\n') + " <a id='{}'></a>\n".format(titles[i])
            
            # adding a link to the ToC IF not already present
            if not cell['source'][i-4] == backToToc[1]:
                cell['source'] = cell['source'][:i] + backToToc + cell['source'][i:]
        #if len(titles) >0:
        #    print( ''.join( cell['source'] ) )
            
NB['cells'].insert(0 , genTocCell(titleList,titleLevels))

with open( FN_out  , 'w' ) as OUT:
    json.dump(NB , OUT)        

print( '\n'.join(titleList) )

1. correlation
1.1. Pearson's (linear) correlation
1.2. Spearman's (rank) correlation coefficient
1.3. Significance of Pearson and Spearman correlation coefficient.
1.4 Kendall tau correlation coefficient (for fun)
Exercise 01
1.5 Correlation and causation
2.Linear regression
2.1.Presentation
2.2.Underlying hypothesis
2.3. Goodness of fit
2.4. Confidence interval and test statistics
2.5. Maximum Likelihood
2.6. Model choosing
2.7. What to do when some hypothesis about OLS are not true
Exercise 02


1. correlation
1.1. Pearson's (linear) correlation
1.2. Spearman's (rank) correlation coefficient
1.3. Significance of Pearson and Spearman correlation coefficient.
1.4 Kendall tau correlation coefficient (for fun)
Exercise 01
1.5 Correlation and causation
2.Linear regression
2.1.Presentation
2.2.Underlying hypothesis
2.3. Goodness of fit
2.4. Confidence interval and test statistics
2.5. Maximum Likelihood
2.6. Model choosing
2.7. What to do when some hypothesis about OLS are not true
Exercise 02
