The master python for data science is an initiative of the Nairobi Women in Machine Learning and Data Science community. This initiative sorts to empower community members by improving their skill sets and by so doing enabling the members to be ready to take up the various opportunities in the industry.
The course is virtual and completely self paced with a slack community to help you collaborate with others who are interested in the course as well.
At the end of this assignment you will be required to upload a notebook with the data challenge (as per below) before you proceed to the next assignment.
For this lesson we will move through chapter one to three of the Python for data science handbook by Jake VanderPlas (link below) and learn:
Chapter 1 : IPython: Beyond Normal Python
- IPython: Beyond Normal Python
- Help and Documentation in IPython
- Keyboard Shortcuts in the IPython Shell
- IPython Magic Commands
- Input and Output History
- IPython and Shell Commands
- Errors and Debugging
- Profiling and Timing Code
- More IPython Resources
Chapter 2 : Introduction to NumPy
- Introduction to NumPy
- Understanding Data Types in Python
- The Basics of NumPy Arrays
- Computation on NumPy Arrays: Universal Functions
- Aggregations: Min, Max, and Everything In Between
- Computation on Arrays: Broadcasting
- Comparisons, Masks, and Boolean Logic
- Fancy Indexing
- Sorting Arrays
- Structured Data: NumPy's Structured Arrays
Chapter 3 : Data Manipulation with Pandas
- Introducing Pandas Objects
- Data Indexing and Selection
- Operating on Data in Pandas
- Handling Missing Data
- Hierarchical Indexing
- Combining Datasets: Concat and Append
- Combining Datasets: Merge and Join
- Aggregation and Grouping
- Pivot Tables
- Vectorized String Operations
- Working with Time Series
- High-Performance Pandas: eval() and query()
- Further Resources
We will be looking at a competition on Kaggle: Data Science for Good: Kiva Crowdfunding (Link on the datasets channel on the community slack)
In this challenge, Kiva an online crowdfunding platform is inviting the community to help then build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans.
The aim will be to explore the data using Python to help Kiva understand their borrowers and their poverty levels so as to better assess and maximize the impact of their work. Participants should develop their own creative approaches to addressing the objective.
Submissions in this challenge will be in the form of Python data analysis using Jupyter Notebooks to the repository. To make this as interactive as possible, everyone will share links to their notebooks that are well documented on Slack as soon as they're done with their analysis. You can find the data on the competitions page.
This will be a good chance to learn how to make reports using Jupyter notebook and make visualizations. Looking forward to what you will come-up with