Course materials for General Assembly's Data Science course in Boston, MA (20 January 2015 - 07 April 2015). View student work in the student repository.
Instructor: Bryan Balin. Teaching Assistant: Harish Krishnamurthy.
Office hours: Wednesday 5-6pm; Friday 5-6:30pm, @Boston Public Library; as needed Tuesdays and Thursdas @GA.
Tuesday | Thursday |
---|---|
1/20: Introduction | 1/22: Python & Pandas |
1/27: Git and GitHub | 1/29: Getting Data |
2/3: Advanced Pandas Milestone: Question and Data Set |
2/5: Numpy, Machine Learning, KNN |
2/10: scikit-learn, Model Evaluation Procedures | 2/12: Linear Regression |
2/17: Logistic Regression, Preview of Other Models |
2/19: Model Evaluation Metrics Milestone: Data Exploration and Analysis Plan |
2/24: Working a Data Problem | 2/26: Clustering and Visualization Milestone: Deadline for Topic Changes |
3/3: Naive Bayes | 3/5: Natural Language Processing |
3/10: Decision Trees and Ensembles Milestone: First Draft |
3/12: Advanced scikit-learn |
3/17: No Class | 3/19: Databases and MapReduce |
3/24: Recommenders | 3/26: Course Review, Companion Tools Milestone: Second Draft (Optional) |
3/31: TBD | 4/2: Project Presentations |
4/7: Project Presentations |
- Install the Anaconda distribution of Python 2.7x.
- Install Git and create a GitHub account.
- Once you receive an email invitation from Slack, join our "datbos05 team" and add your photo!
- Introduction to General Assembly
- Course overview: our philosophy and expectations (slides)
- Data science overview (slides)
- Tools: check for proper setup of Anaconda, overview of Slack
Homework:
- Resolve any installation issues before next class.
Optional:
- Review the code for a recap of some Python basics.
- Read Analyzing the Analyzers for a useful look at the different types of data scientists.
- Check out the PyData Boston Meetup page to become acquainted with the local data community.
- Brief overview of Python
- Brief overview of Python environments: Python scripting, IPython interpreter, Spyder
- Working with data in Pandas
- Loading and viewing data
- Indexing and selecting data
- Assigning, reassigning, and splitting data
- Describing and summarizing data
- Plotting data
Homework:
- Do the class homework by Tuesday.
- Read through the project page in detail.
- Review a few projects from past Data Science courses to get a sense of the variety and scope of student projects.
Optional:
- If you need more practice with Python, review the "Python Overview" section of A Crash Course in Python, work through some of Codecademy's Python course, or work through Google's Python Class and its exercises.
- For more project inspiration, browse the student projects from Andrew Ng's Machine Learning course at Stanford.
Resources:
- Online Python Tutor is useful for visualizing (and debugging) your code.
- Check for proper setup of Git by running
git clone https://github.com/bbalin12/DAT-project-examples.git