# Project 1 Tips


This notebook focuses on how to package your python program in a professional manner. 


## Source Control

If there's one thing I want you to take away from this document, it is: PLEASE USE GITHUB. That said, please also read the rest of this document. Project 1 is like climbing your first cliff after weeks of wall climbing; Git + GitHub are your tether line. As you work on bits of functionality, bug fixes, refactors, etc. you should get into the habit of committing said incremental work. Some of the benefits of using version control early and often are: 

* Backup in case anything goes wrong locally 
* Remote backup in case things go really wrong 
* The simple ability to "look back in time" at what may be deleted code
* Snapshots of code you can revert to if you've made a mistake
* History of evolution that will help you pick the code back up later on
* The ability to `git branch` to try things and discard/merge as necessary 
* Comparable code through git plugins and github.com via pull requests
* Instructors can help you asynchronously via a GitHub link rather than (typically awful) screenshots of code 

I'm sure I'm missing others, but the fact of the matter is that version control systems are the defacto way of collaborating with code in industry. To be an effective data scientist / analyst / engineer, you will need to learn it at some point. The best way to do so is to practice. Don't be these guys:


<img src="https://imgs.xkcd.com/comics/git_2x.png" style="width: 300px;"/>


Refs: 
* https://git-scm.com/doc
* https://xkcd.com/1597/


## Project Structure

One of the most important things you should do when you are working on your project is to put the files where they are expected to be found conventionally. As covered previously, developers like yourself build functionality into logically grouped packages called `modules`. By doing so, we make the entire project easier to read, maintain, and add to. 

Python applications typically follow a format like so: 

```
my-project/
|-- my-project/
|   |-- main.py
|   |   
|   |-- submodule_1/
|   |   |-- something.py
|   |   |-- thing1.py
|   |   |-- thing2.py
|   |   
|   |-- submodule_2/
|   |   |-- other_thing.py
|   |   |-- thing4.py
|   |   
|   |-- test/ (not required for project 1)
|   |   |-- test_main.py
|   |   
|-- README.md
|-- setup.py (not required for project 1)   
|-- requirements.txt (not required for project 1)
```

The structure above is for a generic project called `my-project`. In its (git/GitHub) folder "root", a second folder with the project name contains the entire source code. Some developers like to use `src`, whichever is fine. Underneath the source directory is a `main.py`, which is the entrypoint of the application and all supporting submodules. This structure is very clear and easy to maintain! Let's look at it in context of a simple Slack bot for [Leetcode](https://leetcode.com/problemset/all/) questions, called `leetbot`. 


```
leetbot
├── leetbot
│   ├── main.py
│   ├── leetcode
│   │   ├── posted_questions.py
│   │   ├── problem.py
│   │   └── questions.py
│   └── slack_client
│       └── client.py
├── README.md
└── requirements.txt
```

In this example, main.py is where the application starts. `leetcode` is a module that interacts with and formats Leetcode.com questions. The slack_client is responsible for interfacing with the Slack session via API. This all comes together to post a Leetcode question to a specified channel every day. If you were hired to work on this bot, and were asked to change the slack message format, where would you look first? Probably in the `slack_client` module! 


Refs: 
* https://jcalderone.livejournal.com/39794.html
* https://newbedev.com/what-is-the-best-project-structure-for-a-python-application
* https://github.com/MuminKhan/leetbot

## Classes

Generally, each class should live in exactly 1 .py file. There are exceptions to this, mainly when you're using a nested inner class (like an enum), but related classes go in the same module not the same file. You'll notice there are three files in the `leetcode` module below; at a glance I would expect there to be only 3 classes. 

```
leetbot
├── leetbot
│   ├── main.py
│   ├── leetcode
│   │   ├── posted_questions.py
│   │   ├── problem.py
│   │   └── questions.py
│   └── slack_client
│       └── client.py
├── README.md
└── requirements.txt
```

You'll notice that all the files are lower cased. This is due to the [PEP8 standard on the matter](https://www.python.org/dev/peps/pep-0008/#package-and-module-names). The actual class names, however, are the upper case version of the file name for the same reason. This leads to an import that looks like these: 

```python
from leetcode.posted_questions import PostedLeetCodeQuestions
from leetcode.questions import LeetCodeQuestions
from leetcode.problem import LeetProblem
```

To recap: One class per file. Lower case module name. Lower case file name. CapWords convention class name. 


Refs: 
* https://docs.python.org/3/library/__main__.html
* https://github.com/MuminKhan/leetbot/blob/master/leetbot/main.py

In [None]:
## Main.py

Now, if you were instead hired to find an illusive bug that seemingly happens randomly, where would you start looking? Well, you could try to look in the submodules but you might get lost in some of the implementation details. It would likely be more efficient to begin at the beginning! This is why a lot of modules opt to use a `main.py` or similar. `main.py` can be called quite a few things, like `program.py` or the name of the program itself (`my-project.py` or `leetbot.py` from above). Typically this python file also uses the `if __name__ == "__main__":` convention. 

From the [python docs](https://docs.python.org/3/library/__main__.html):

```python
# echo.py

import shlex
import sys

def echo(phrase):
   """A dummy wrapper around print."""
   # for demonstration purposes, you can imagine that there is some
   # valuable and reusable logic inside this function
   print(phrase)

def main():
    """Echo the input arguments to standard output"""
    phrase = shlex.join(sys.argv)
    echo(phrase)

if __name__ == '__main__':
    main()
```

This simple script, called `echo.py` will print out anything passed into it like `$ echo` does. You'll notice on the bottom, the `if __name__ == '__main__':` block calls the `main()` method, executes it, and uses the return value as the script exit code. Without knowing what it's doing though, one can immediately recognize that the program starts at `if __name__ == '__main__':`, and can continue to trace it out. Another consequence of using `if __name__ == '__main__':` is that the code will only run if the script is called. If `echo.py` is imported, nothing in `if __name__ == '__main__':` will run! Meaning you can safely import functions without worrying about (potentially catestrophic) side effects. Creating a function called main() will allow you to also import/test it elsewhere. Back to our `leetbot` example, the entire program, from start to end, is represented in the `if __name__ == '__main__':` block: 

```python
def main():
    posted_questions = PostedLeetCodeQuestions(args.data_file)
    problem = get_question(posted_questions)
    message = build_message(problem)

    if message is None:
        print('Nothing to post...')
        exit()

    response = slack_client.client.post_to_slack(message)
    if response is not None:
        posted_questions.add_posted_question_id(problem.question_id)
        posted_questions.write_posted_questions()


if __name__ == "__main__":
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
    logger.addHandler(logging.StreamHandler())
    args = parse_args()
    main()
```

Without knowing anything about how the program works, it's easy to tell that it's essentially: 

1. Parsing user args
2. Getting a leetcode question to post
3. Packaging that question for posting
4. Trying to post the message 

Imagine having to read across 5 different files to understand the flow of a program... It would be unpleasant at best and unmanageable at worst. By using the `if __name__ == '__main__':` convention and a `main()` method, we've captured our program's **"core loop"** in one place


Refs: 
* https://docs.python.org/3/library/__main__.html
* https://github.com/MuminKhan/leetbot/blob/master/leetbot/main.py


## The "Core Loop"

Core loop is a term that comes from game development. I like to use it to describe, in essence, your program's flow from start to finish. Sure, certain user choices might dramatically alter how they interact with your program. More often than not, however, the overarching program can be expressed in a `main()` method. For simple scripts and programs, it's trivial to write the core loop in the `main.py` file such that the program's entry point and nominal exit point are present, with the intermediary steps enumerated between them. This is also a great place to create and transform objects, pass arguments between them, get return values, etc. For simple programs I also like to keep all `print()` calls in the `main()` method so that there's no UI dependencies anywhere else. 

Refs: 
* https://medium.com/@davengdesign/core-loops-c98b1197e93d

## Parting Thoughts

Project 1 is significantly harder than the homeworks, not because it's more technically involved (though it can be) but because you have significantly more creative freedom. Off the top of my head, here are some Do's and Don't's from someone who has been in your shoes before: 


### Do
* Use a nice IDE like VS Code or Pycharm with python support. Play around with the debug and refactoring features. 
* Build a Minimimum Viable Product (MVP) then increment and iterate
* Use git/github EARLY and OFTEN. Commits are free: use them, especially after major implementation/refactoring 
* Carve time out over multiple days. Fresh eyes can do wonders for productivity.
* Refactor your code every so often. It doesn't have to be perfect the first time, but don't write a swamp and try to fix it later. 
* [Comment](https://www.askpython.com/python/python-comments) your code WELL. Not only will this pay dividents while working on the project, it's a major part of your grade. 
* Follow the [PEP-8 style guide](https://www.python.org/dev/peps/pep-0008/), especially for naming conventions. You can ignore the 80 character line limit (160-200 is fine).
* Consider running your code through a linter like pylint or flake8. This can often be done [inside a modern IDE](https://code.visualstudio.com/docs/python/linting).
* Be cognizant of priorities: You should work on critical features first instead of trying to do it all in one go. 
* Rubber duck debug if things are going wrong and you can't figure out why 
* Ask for help when you get stuck


### Don't
* Do the opposite of the do's
* Try to start this the weekend before it's due
* Design yourself into a box by being afraid of change
* Be afraid to experiment and throw things away
* Put more than one class in a .py file unless the second class is nested (like an enum)
