These are materials for an introductory course in data science using Python, Pandas and Scikit-learn. Materials include a textbook written completely in Jupyter notebook (and also hosted as a Jupyter Book).
The textbook is written entirely in Jupyter notebooks. This allows students to download the notebook and work with everything interactively. The textbook has the following chapters:
- First steps with Pandas and Jupyter notebooks
- Plotting and grouping data
- Correlation
- Linear regression
- Training models
- Classification
- Nearest neighbors
- Decision Trees
- Improving your model
- Ensemble tree models
- Working with large datasets
The textbook is open source and you are free to modify it as needed. If you find errors or have suggestions or changes or improvements, please contact me at psavala@stedwards.edu.
If you would like to build the textbook so that it is viewable in a browser (such as here) you can do so using the Python package Jupyter Book. To do so, first clone this repository. Then install Jupyter book using pip install jupyter-book
. Next, navigate to where you cloned from this repository to (likely a folder named Introduction-to-Data-Science
and type jupyter-book build textbook/
. Then, follow the instructions shown by Jupyter Book.
Four projects are laid out, which take students from basic exploratory data analysis, up to building advanced models using multi-gigabyte data sets. These projects are designed to each take approximately three weeks. They culminate with a presentation, a Jupyter notebook showing all work, and an article appropriate for a general audience. In my course I require students to post their article on their LinkedIn page in order to appeal to potential future employers.
Skills challenges are weekly labs where students are given a dataset and a number of questions/tasks to handle. They must complete these on their own, but are welcome to use any external resources they wish. These skills challenges enforce the idea of needing to be able to quickly accomplish routine tasks.
If you have questions, comments or feedback, please contact me at psavala@stedwards.edu. You can also connect with me on LinkedIn.