# Data Science & More

## Testing

- [pytest](https://pytest.org/en/latest/index.html)
- [doctest](https://docs.python.org/3.7/library/doctest.html)
- [unittest](https://docs.python.org/3.7/library/unittest.html)

- [Testing Python Applications with PyTest](https://semaphoreci.com/community/tutorials/testing-python-applications-with-pytest)

## Debugging

- The Python Debugger: [pdb](https://docs.python.org/3.7/library/pdb.html)
    - Also [ipdb](https://github.com/gotcha/ipdb)
- [Five exercises to master the Python debugger](https://www.tjelvarolsson.com/blog/five-exercises-to-master-the-python-debugger/)


## Profiling

- [timeit](https://docs.python.org/3.7/library/timeit.html)
- [cProfile](https://docs.python.org/3/library/profile.html)


## Performance Tuning

- The Global Interpreter Lock

### GIL Articles
- [What's the deal with the Python GIL?](https://realpython.com/python-gil/)
- [Understanding the Python GIL (PDF slides)](https://www.dabeaz.com/python/UnderstandingGIL.pdf)
- [The Python GIL](https://rohanvarma.me/GIL/)


## Data Science

- [Python Data Science Handbook (O'Reilly)](https://jakevdp.github.io/PythonDataScienceHandbook/)
    - [PDSH GitHub repo](https://github.com/jakevdp/PythonDataScienceHandbook)

### Data Structures & Manipulation

- [numpy](http://www.numpy.org/)
- [pandas](http://pandas.pydata.org/)
    - [Introductory Notebook](http://nbviewer.jupyter.org/github/donnemartin/data-science-ipython-notebooks/blob/master/pandas/pandas.ipynb)

### Plotting

- [matplotlib](https://matplotlib.org/) ([examples](https://matplotlib.org/gallery/index.html))
- [seaborn](https://seaborn.pydata.org/index.html) ([examples](https://seaborn.pydata.org/examples/index.html))
- [bokeh](https://bokeh.pydata.org/en/latest/) ([gallery](https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery), [quickstart](http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/quickstart/quickstart.ipynb))
- [plotly](https://github.com/plotly/plotly.py)
- [plotly dash](https://github.com/plotly/dash)
    - [dash app gallery](https://dash-gallery.plotly.host/Portal/)


### Community

- [kaggle](https://www.kaggle.com/) ([kernels](https://www.kaggle.com/kernels))
    
### Scikit Learn

- [CheatSheet](https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Scikit%20Learn.png)
- [Your First Machine Learning Model](https://www.kaggle.com/dansbecker/your-first-machine-learning-model)


### Cheat Sheets

- [Numpy](https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Numpy.png)
- Pandas [1], [2], [3]
- [MatplotLib](https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Matplotlib.png)

[1]: https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Pandas-1.jpg
[2]: https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Pandas-2.jpg
[3]: https://raw.githubusercontent.com/kailashahirwar/cheatsheets-ai/master/Pandas-3.png

## Exercise: First steps with pdb

Run any python script and examine it from the command line with PDB _or_ using VSCode.

    python -m pdb my_script.py
    
Example code for debugging:

    x = 0
    for _ in range(10):
      x += 1
    print(f"The final value of x is: {x}")
    print(random.randrange(1, 101))

- notice that you can get a list of pdb commands by typing ?
- set at least one breakpoint using the break command and experiment with the continue , next, and step commands
- use the display command to display the values of expressions at various points in the execution of your code
- experiment with the restart command
- experiment with assigning values to variables during a debugging session
- experiment with the list command

# Exercise: Profiling with cProfile

Profile a script of your choosing using the following command line:  `python -m cProfile my_script.py`

Sample script:

    import time
    
    
    def f1(n: int=3) -> int:
        print("Running SLOW function")
        time.sleep(n)
        return 42
    
    
    def f2() -> int:
        print("Running FAST function")
        return 43
        
    f1()
    f1(5)
    f1(10)
    f2()
    f2()

- sort by cumtime
- save the result to a text file
- save the result in the default binary output format
- data in the binary format can be visualised using several tools, ex: gprof2dot (needs the graphviz package, which provides the dot command, to be installed)

        gprof2dot -f pstats cProfile.data | dot -Tpng -o my_profile.png
