The aim for this document is to give a whirlwind tour of Python and Useful Data Science libraries for people new to Python. The aim is not to cover everything, but just introduce a few relevant topics and point to reasonably good resources and links to read up from.
NOTE: This is just a hastily written guide, and is not affiliated to CQF / Man Group or any of the links / courses in any way.
It's quite easy (probably why these libs are so successful) to start using pandas / numpy and copy-paste boilerplate code without understanding the basics of the language itself. But before we can start delving into ML and other advanced DS stuff I think it's critical to at least understand these basic topics in Python:
- Variables, constants and common data-types
- Common data structures: lists, sets, tuples, dictionaries
- Conditionals and loops: if/else and for
- Functions: Custom defined ones and importing ones from the standard lib
- Basic file parsing and error handling
Obviously Python is a vast topic and this only covers a subset but imo it should cover the relevant fundamentals before starting with DS 101. I did glance a few courses for introductory python and
https://developers.google.com/edu/python/introduction
TODO
TODO
TODO
TODO
This section broadly covers the fundamentals of Numpy, Pandas and basics of plotting.
- Numpy: Memory structure and useful modules
- Pandas: Reading data, indexing, filtering, aggregating and working with bad data.
- Plotting: basics and common libs
a) Memory layout b) Vectorization c) C implementation