# Module 1: Data Wrangling with Python

## Sprint 1: Python Mastery

## Part 2: Improving Code Reliability

## About this Part

In this Part, we will focus on improving code reliability by using type checking - something that was recently added to Python.
Static type checking allows you to see any type errors without running your code and even can be used to understand your code better.
You will learn how to do that with MyPy, which is by far the most stable and popular tool used for this purpose.
You will also see some examples of how to write clean code in Python.
Among other things, we will also learn about our first machine learning algorithm - k-means clustering.


Starting with this Part, each resource will have a suggested depth level and an approximate number of hours that you should spend on it.

## Depth levels and time estimates

Depth categories will be used as follows:

- C1 - watch or read the material. No need to look up the documentation of unfamiliar code or look up new concepts in other resources. If there are exercises that are part of the resource, you can treat them as optional. If you feel that you do not understand more than half of the concepts or code in the resource, you are most likely missing some prerequisites. Your understanding of the new material should be 45-65%.
- C2 - watch or read the material attentively. Look up the documentation of unfamiliar code and look up new concepts in other resources if they feel unclear. Your understanding of the new material should be 65-75%.
- C3 - watch or read the material very attentively. Look up the documentation of unfamiliar code and look up unfamiliar concepts in other resources until you are confident in your understanding. Your understanding of the new material should be 75-85%.
- C4 - watch or read the material while replicating all code examples that you read or see on the screen. Look up the documentation of unfamiliar code and retype all unfamiliar code blocks. Look up unfamiliar concepts in other resources until you are confident in your understanding. It is OK if there are still some concepts (or code) that you don't understand after the material, then spend time researching them in other resources. Your understanding of the new material should be 85-90%.
- C5 - watch or read the material while replicating all code examples that you read or see on the screen. If there are still concepts (or code) that you don't understand after the material, then spend time researching them in other resources. Your understanding of the new material should be 90-95%.
- C6 - watch or read the material while replicating all code examples that you read or see on the screen. In addition, try to replicate the functionality or concept yourself with different parameters (e.g., different dataset). If there are still concepts (or code) that you don't understand after the material, then spend time researching them in other resources. Your understanding of the new material should be 95-100%.

For example, if you see (C4, 1.5) next to a resource, it means you should be reading the material carefully, replicating the code, aiming for an understanding of 85–90%, spending approximately 1.5 hours.


## Objectives for this Part

- Learn about static type checking with MyPy
- Learn the k-means clustering unsupervised learning algorithm
- Learn how to do data extraction, transformation, and analysis with using defaultdict
- Learn how to read CSV files using iterators
- Learn how to apply Clean Code principles in Python
- Understand how Python dictionaries work
- Learn the basics of testing in Pyhton
- Understand the differences between software different licence types
- Practice Python skills on HackerRank
- Practice using strings in Python by doing the Pig Latin exercise

### [Big Ideas and Little Code in Python by Raymond Hettinger](https://learning.oreilly.com/videos/modern-python-livelessons/9780134743400/)

- Lesson 3: Improving Reliability with MyPy and Type Hinting (C4, 1)
- Lesson 4: Implementing k-means Unsupervised Machine Learning (C4, 1.5)
- Lesson 5: Building Additional Skills for Data Analysis (C4, 1)

*Additional note:* While type hinting got to a relatively slow start in the industry, currently it is very rare to see larger production-level applications without type hints.

### [clean-code-python](https://github.com/zedr/clean-code-python) (C3, 1)


### [How Python dictionaries work](https://tenthousandmeters.com/blog/python-behind-the-scenes-10-how-python-dictionaries-work/) (C1, 1)


### [Getting Started With Testing in Python](https://realpython.com/python-testing/) (C1, 1)


### [Choosing a License](https://docs.python-guide.org/writing/license/) (C1, 0.5)


### [HackerRank](https://www.hackerrank.com/)

- [Collections.deque() (C4, 0.5)](https://www.hackerrank.com/challenges/py-collections-deque/problem)


### [Python Workout](https://learning.oreilly.com/library/view/python-workout/9781617295508) (C4, 1)

This is how you should complete this task:

- Read the exercise description below.
- Try to solve the exercise yourself.
- Read the WORKING IT OUT, SOLUTION, and BEYOND THE EXERCISE sections from the book.

2 Strings | Exercise 5 ■ Pig Latin

Pig Latin (http://mng.bz/YrON) is a common children's "secret" language in English-speaking countries. (It's normally secret among children who forget that their parents were once children themselves.) The rules for translating words from English into Pig Latin are quite simple:

If the word begins with a vowel (a, e, i, o, or u), add "way" to the end of the word. So "air" becomes "airway" and "eat" becomes "eatway."

If the word begins with any other letter, then we take the first letter, put it on the end of the word, and then add "ay." Thus, "python" becomes "ythonpay" and "computer" becomes "omputercay."

(And yes, I recognize that the rules can be made more sophisticated. Let's keep it simple for the purposes of this exercise.)

For this exercise, write a Python function (pig_latin) that takes a string as input, assumed to be an English word. The function should return the translation of this word into Pig Latin. You may assume that the word contains no capital letters or punctuation.

This exercise isn't meant to help you translate documents into Pig Latin for your job. (If that is your job, then I really have to question your career choices.) However, it demonstrates some of the powerful techniques that you should know when working with sequences, including searches, iteration, and slices. It's hard to imagine a Python program that doesn't include any of these techniques.