# python

## why might you use `python`

[`python`](https://www.python.org/) is one of the most popular scripting and programming languages in the world. there are, [like, ninefinity different ways of ranking programming lanaguages](https://www.python.org/), and `python` sits in the top 5 of almost every one of them.

I have used it on every single project I've ever worked on. I am an unashamed `python` fanboy.

I'm also not really one for the `R` vs. `python` holy wars -- the two have different use cases and any conversation that attempts to settle "which is better" is already fundamentally flawed, in my opinion. That being said, I'd like to make the following case as to why you should *learn* `python`, even if it doesn't become your go-to data science language:

##### `python` is much more common than `R` *outside* the statistics community

this is a feature of a number of biases, but also speaks to something fundamental about the differences between the languages: `R` is a very *deep* language in a very *narrow* field of concepts (namely, statistics), whereas `python` is among the most *broad* and *flexible* languages without a central purpose.

Another reason this matters: the barrier to entry for a company or government agency's IT department will be lower (or already surpassed) for `python` and `python` packages if for no other reason than that computer engineers are familiar with it by default.

##### there is a package for that

this is a corollary of the previous point: if there is a thing you want to do, it is very likely that some one has already done it in `python`, and that their work is available for you to use.

For a point of comparison, there are 11,282 `R` packages on `CRAN`, and 115,497 on `python`.

In [None]:
import antigravity

##### it does all the most basic things well

although it is available and well-supported on every OS, `python` is very much a linux-focused language. it "grew up" in linux as an alternative scripting language (alternative to `bash` and other `shell` scripts). because of this, it acquired some of the linux philosophy points, and specifically those that focus on simplicity.

many of the current iterations of linux tools are actually calling `python` scripts under the hood, which means that essential things like web scraping, emailing, scheduling and timing, networking, logging, and database access are all possible and highly optimized in `python`

##### it is fun

`python` was created with the express interest of being as simple to program in as possible. most of the syntax and rules are specifically generated to make the language function as much like pseudo-code as possible, so code is easy to read.

the community also has an edge to it. a good example: start a python session and type

```python
import this
```

In [None]:
import this

okay, so it's a particular type of fun for a particular type of person. but given the prior that you're in this class, I suppose that's a safer prediction

## why might you not?

### version 2 vs. 3

I'd be remiss not to mention one of the major red marks against `python` -- the infamous "2 vs. 3" upgrade controversy.

in order to make some very low-level changes to the language (primarily for performance improvements and to support international languages), the developer community chose to make a new major version of `python`: `python3`.

the process caused a lot of confusion among newcomers to the language -- which was exploding in popularity at around the same time -- and also put a large burden of uncertainty on corporate developers and development.

the bottom line, in my opinion is this:

***unless you have no other option, you should always use python 3 and only python 3***

# the *engineering* side of `python`

often I feel data scientists end up so eager to dive into the cool things you can do with the language that the ignore one of the real reasons to work in `python` in the first place: as a language nurtured by a software development community, it has solved (in good ways and terrible ways) various problems that make production-izing code easier (or, sometimes, possible at all)

## packages

a given file of executible `python` code is probably best referred to as a "script", but a collection of scripts which expose some sort of interface to a user to do "something" are generally called a "library" or a "package". 

This is mostly the same convention as in the `R` community -- think of the differences between scripts you wrote and `dplyr` and all the other stuff Hadley wrote.

### my favorite packages

So what sorts of `python` packages should you use?

first of all, the builtin packages are pretty great, and cover a wide range of the most necessary use cases for a programming language (e.g. file i/o and os utilities and tie-ins). The ones I use most often are:

+ `argparse` - reading in and parsing command line arguments
+ `collections` - sets of "collection" objects (e.g. ordered dictionaries, named tuples, default dictionaries)
+ `csv` - for reading and writing delimited files
+ `datetime` - the fundamental date object and utilities library
+ `functools` - functional tools, including fancy stuff like partial function definitions and caching
+ `itertools` - an awesome library of utilities for iterating through collections of items
+ `json` - for parsing and constructing well formatted JSON
+ `logging` - for logging messages to console, file, etc
+ `os` - operating system interaction (I use this in almost every single program)
+ `pickle` - a `python`-native serialization protocol, for saving `python` stuff
+ `random` - a decent (if not special) randomization library
+ `re` - regular expression parsing library
+ `time` - a generic OS-level time interface

for any `python` installation, these *already exist* -- no installation necessary

there are also a ton of great open-source libraries for just about any purpose you might imagine. Again, the ones I use most often:

+ `flask` - a `python` web framework (for standing up webpages)
+ `ipython` - the best interactive shell, it just makes the normal python program look silly
+ `jupyter` - the interactive extension of the above (`ipython`, this is what is used to make this bodacious document you see before you)
+ `lxml` - a fast and flexible XML / HTML library
+ `matplotlib` - a plotting library that is super useful but will make `R` users dream of their former glory
+ `nltk` - Natural Language Tool Kit, a library for language processing and text analytics
+ `numpy` - NUMerical PYthon, a lot of super duper array and linear algebra glue code to make C and FORTRAN routines available in `python`.
+ `pandas` - PANel DAta, a dataframe interface for feature data. This is the main data science library in `python` and, again, I use it in almost every single program
+ `plotly` - an amazing plotting library

+ `psycopg2` - a `postgres` library
+ `requests` - the main web GET and POST library
+ `scipy` - SCIentific PYthon, and extension of `numpy` to include a more scientific utilities
+ `scrapy` - a flexibile but easy web scraping framework
+ `seaborn` - something you import whenever you use `matplotlib` to make your plots non-heinous (also has some useful functions that no one has discovered yet)
+ `selenium` - a javascript engine library (for when `requests` isn't good enough)
+ `sklearn` - the other half of the primary data science workflow, an all-purpose modeling library
+ `sqlalchemy` - an ORM library for most sql databases. It's pretty flashy and when you finally need it, you'll know in your heart.
+ `tqdm` - a fancy-pants progress bar library. You don't need it, but you want it.
+ `yaml` - a library for parsing the world's greatest configuration format, Yet Another Markup Language (YAML)

### installing packages

So, let's take a journey together.

Unlike `R`, the folks who put `python` together thought that people should care about the versions of the packages they installed. They didn't really do anything to make this happen in a sane way, though, so there were like ten different ways to install packages. 

If you learned `python` in the early days, you probably heard it was hard to install packages. Well, it was. Maybe it still is, depending on your attitude. That's right, I'm blaming the victim.

Really, though, I'm sorry. If you're coming to `python` from `R` this probably feels silly. Why not just have an `install` function and install whatever you want? 

Why? Basically, because that's a bad idea for writing production-level software.

production-level software is meant to be deterministic, and to be stable. Software that has the ability to install packages within the language has several disadvantages:

1. avoids administrator oversight
    1. having to ask your admin to install something is a *good* thing
2. could install something malicious or broken without anyone knowing
3. could install different versions on different machines at different times

Basically, the versions of all your packages matter, so you should care about that stuff. The `python` community is pretty stickly about that and has gone to great lengths (and, like, 15 different methods) to try and solve that problem. And today, that means that everyone is doing one of the following:

+ using `pip` ("Pip Installs Python", and yes, recursive acronyms are annoying)
+ using `pip`, but in a virtual environment
+ using `conda` (virtual environments on steroids or amphetamines, depending on whether you're a data scientist or sysad (resp))

I advocate using `conda` for many reasons -- more on this later.

## environments

### basic environments

Let's take a quick python version poll:

on your laptop (not your `ec2` instance), what `python` version do you have installed?

```bash
python --version
```

In [None]:
%%bash
python --version

different versions of `python` (and different versions of installed packages) have different files defining the language's behavior and thus different levels of compatability. personally, I think knowing that these files exist is among the more important.

***the way that the code you wrote behaves depends on these files***

recall the `which` command, which will tell us they path that will actually be called when we type in a command

```bash
which python3
```

In [None]:
%%bash
which python3

your out-of-the-box `ec2` instances will likely return `/usr/bin/python3`. so when you type `python3` on the command line, you will actually call the executible file `/usr/bin/python3`.

the same sort of thing is going on for individual `python` modules we import. Every module has a "private" member `__file__` which lists the path to the file used to define that module:

In [None]:
import os
os.__file__

let's look at that file!

```bash
# for you, it is:
less /usr/lib/python3.5/os.py

#for me, right now, it'll be different -- hence the craziness below. sorry!
```

In [None]:
%%bash
OS_FILE=$(python -c "import os; print(os.__file__)")
cat $OS_FILE

if you change that file, or your friend (who is running your code) doesn't have that same file, the code that uses `os` will be different.

the same caveat goes for every file or environment variable used by your python process on any machine. This collection of files is often called the "`python` environment", and it can be different on any system.

in the real world, the implication is immediate: if one of my programs only works for version 1.2, and another only works for version 2.1, and the `GOVERNMENT AGENCY NAME REDACTED` sysad just installed library 1.0 and *that* took two years, this  will probably be a problem.

It would be nice if this problem was solved...

### virtual environments

"virtual environments" are ways of isolating out the contents (the files) of libraries you're installing.

This is something you've actually probably done in `R`, actually, without knowing it. if you've ever tried installing a package but didn't have admin rights, the `R` interpreter prompts you to see if there's some other place you'd like to install things (usually in your home directory). 

that is a system-level isolation of the files you want to install. When the interpreter is told to load a package, it looks first for your local copy to see if you have anything spicy, and then the global copy, and then it cries.

So, generalize that idea: let's make *many* separate environments (collections of files defining how our `python` code behaves).

We can generalize this beyond just "global" and "user" (as with `R`), even creating a separate environment for each process or code base.

On a very basic level, all we're doing here is re-installing packages into a special sub-directory somewhere on the machine, and then telling `python` (through environment variables like the `PATH` variable) where to look to find them. 

We're tricking `python` into doing the right thing. and `python` is cool about it; once it realizes it's been tricked it's not even mad or anything, it's strong in our relationship and knows that it was all a bit of a goof and what's more, we all actually really had a great time and made some good memories.

Often times finished `python` projects will ship with a `requirements.txt` file, which lists each `python` package which should be installed and the exact version that it was tested against, and it is expected that it will be executed by a system with the same packages and versions. 

The "virtual environment" is an isolated set of packages that will meet that requirement.

The original way of creating a virtual environment was the python utility `virtualenv`, which is awesome and worth checking out. That being said, however, it's not what I'll recommend. Instead, I'll recommend...

## generalizing virtual environments: `conda`

`conda`, short for `anaconda`, is a *distribution* of python. it takes the virtual environment concept above and adds a special wrinkle: while most virtual environment managers allow you to install different versions of `python` *packages*, `conda` allows you to install different versions of `python` *itself*.

this should help you deal with any `python2` vs. `python3` problems you may experience.

so, let's go ahead and install `conda`, create a virtual environment, and install something.

*note: I would recommend you install `conda` on both your laptop and your `ec2` instance, but we will *require* you to install it on your `ec2` instance (it's part of the homework), so you may want to use that instance to do all of this right now*

#### installing `conda`

`conda`, by default, comes with many of the most commonly downloaded `python` packages. This is great because it gives you a pretty solid working base without any modification, *BUT* given our time and bandwidth limits, I'm going to recommend you install the `miniconda` version (the bare bones) and install packages *as needed* instead of up front.

+ [`conda`](https://www.continuum.io/downloads): a big installation, which will take a few minutes, and pre-installs several of the "must haves" (many of the above, and maybe more)
+ [`miniconda`](https://conda.io/miniconda.html): a bare-bones implementation of the above for the *discerning* gentleprogrammer

Download that stuff. Then follow the instructions on the download page, which will probably say:

```bash
bash Miniconda_some_other_stuff_.sh
```

And then, once everything is done:

```bash
conda update conda
```

<div align="center">**everyone installs `conda`**</div>

note: the download link for the miniconda bash script *will change*! update it by actually going to [the miniconda website](https://conda.io/miniconda.html)

+ go to [the miniconda website](https://conda.io/miniconda.html) to get the bash script name
    + we are looking at the 64-bit linux installer
+ download that bash script to your `ec2` server and run it

```bash
cd ~
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# when prompted, we do the following:
# press ENTER to read the license
#     press `d` to scroll *d*own
# yes: approve the license
# ENTER: we are okay with this location
# yes: allow your path to be updated to *always* include conda

conda update conda
```

recall that we previously called

```bash
which python3
```

and got `/usr/bin/python3`, and we also checked the file path to the `os` package (from within a `python` shell):

```python
import os
os.__file__
```

what do we get now, after installing `conda`?

*everything* about `conda` is installed in one and only one directory. "uninstalling" `conda` is equivalent to simply deleting that directory.

the act of creating an environment creates a new folder under the `env` sub-directory in that main `conda` directory, and installing all of our required packages there. Let's look into that right now:

```bash
conda create -n l33tmode python=3
```

will use `conda` to create an environment named "`l33tmode`" with `python` version 3 installed.

as the little dialog will state after you create the environemnt, you have to "activate" that environment if you want to use it. You have to do this any time you want to use a virtual environment.

what we're *actually* doing here is updating the `PATH` environment variable to "point" `python` to our newly created set of files. Now, when we wish to use `python`, we will be using our specialized, isolated versions

So let's do that:

```bash
# mac or linux:
source activate l33tmode

# windows
activate l33tmode
```

This should have made our terminal prompt 10 times l33t3r. To verify that we're now looking at different files:

```bash
which python3
```

Now let's install some stuff

```bash
conda install jupyter ipython
```

and then try it out

```bash
ipython
```

this should open a fancier python interpreter (`ipython`)

# actually writing `python` code

## interactive shells: `ipython` and `jupyter notebook`

the default `python` command opens a vanilla `python` shell, where you can execute any of the `python` commands your heart disires. that being said, the experience is obviously lacking the bells and whistles of any modern code development or execution environment.

for your personal use, `IPython` (interactive `python` shell) and `jupyter notebook`s are as close as it comes to a *must install* package as there is.

I personally think of `ipython` as being the primary means of developing software, and `jupyter` as being almost exclusively for exploratory documents and presentations, but you should do whatever works for you!

the documents and slideshows we've been using as lecture notes this whole time were created with `jupyter`, a cool `python` package which allows you to execute interpreted `python` commands in a "notebook" format, where commands and notes are isolated into separate "cells" that can be executed on demand.

There are a couple of popular "ways" of developing `python` code, and `jupyter` notebooks are probably the most popular.

I highly recommend becomming familiar with both, but particularly `jupyter`!

## editors and `IDE`s

there are a multitude of options for developing code in `python`, and the choice really comes down to your personal preferrences. If you've "grown up" coding in `RStudio`, the you probably expect a windowed environment where you can write scripts, execute blocks, visualize output, and explore objects, you are probably going to favor one of:


+ [`rodeo`](https://www.yhat.com/products/rodeo)
+ [`spyder`](https://pythonhosted.org/spyder/)

personally, I'm a huge fan of developing code in side-by-side terminals -- one for a regular editor and another for an `IPython` session. I have the ability copy and paste code for quick execution, but I *also* have a workflow which forces me to write real modules with callable functions I can re-import as I develop. It's a way of making sure the code I write becomes somethign I can deploy

# a crash course of stuff you should know or learn!

I know that Stuart covered `python` in his course, so this may be overkill. If you're a `python` pro, bear with me -- sit back and bask in your total l33tness while we take a lightning tour of things that I think are #important.

some of these topics may feel a little out of left field, but they are things I've learned that I think are essential (but not sufficient) to being a good `python` programmer

## code structure and organization

+ [pep8](https://www.python.org/dev/peps/pep-0008/) was a really good idea. you should follow it
+ keep code in files called "someshortword.py"
+ there are basically two types of `py` file:
    + modules: I can run `import thisthing` in a `python` session and nothing happens, but now I have new `python` toys
    + scripts: I can run `python thisthing.py` from a bash shell and it *does a thing*
    + if your file does a combination of those two, you should ask yourself why
+ if I run `import thisthing` and *something happens*, that is almost always not a good idea

a bad idea

```python
# thisthing.py

import pandas as pd
import sklearn.neural_network

x = pd.read_csv('magicdata.csv')
y = pd.read_csv('easytarget.csv')
m = sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(1E999, 1E99999999999999), random_state=1337)

m.fit(x, y)
```

a better idea

```python
# thisthing.py

import pandas as pd
import sklearn.neural_network


def load_xy(xfile='magicdata.csv', yfile='easytarget.csv'):
    x = pd.read_csv(xfile)
    y = pd.read_csv(yfile)  
    return x, y
   
   
def model(x, y):
    m = sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(1E999, 1E99999999999999), random_state=1337)
    m.fit(x, y)
    return m
    

def main(xfile='magicdata.csv', yfile='easytarget.csv'):
    x, y = load_xy(xfile, yfile)
    m = model(x, y)
    print(m.coefs_)
    

# more on this later...
if __name__ == '__main__':
    main()
```

## `io` operations

### reading and writing files

for many people, the idea of "opening a file" is not any different than saying "go get this file and give me all the stuff in it". this is often basically all you want to do, after all. however, there are actually many different things you might want to do with a file

1. replace or remove all of the occurrences of a word
2. load the first 100 lines only
3. search through a file to find out what line a particular string is on
4. replace all windows-style carriage returns with new lines

now, you could just load an entire file to a string object, make your changes, and write it out. that's fine until you get to a file that is several GBs.

most of the things I mentioned above that you might want to do involve *iterating* through a file one character or line at a time. this is the fundamental way that `python` handles files.

`python` interacts with the file system through a concept called a "file object," which you can basically think of as a cursor pointing to a memory address at a certain point within a file. given where this cursor is currently, the file object could read the next character, the next word, the next line (etc). it could write new contents to the file.

the main function for interacting with files is the `open` function. 

In [None]:
help(open)

In [None]:
f = open('/tmp/testfile.txt', 'w')

f.write('hello')

# this saves the writing we've done
f.flush()

In [None]:
%%bash
less /tmp/testfile.txt

In [None]:
f.write('world')
f.flush()

In [None]:
%%bash
less /tmp/testfile.txt

yep -- you even have to write the new line characters:

In [None]:
f.write('\n')
f.write('hello\n')
f.write('world')
f.flush()

In [None]:
%%bash
less /tmp/testfile.txt

note: you *have to close* file objects!

In [None]:
f.close()

so, this may feel a little low-level and annoying, and also overkill for some of our purposes. well, it is. people much smarter and better at programming at `python` did us a solid by writing us a bunch of libraries to handle the reading and writing of data.

that being said, *it is super common* that a function wants to take a *file object* and not a name of a file. so you should get used to the idea that you might have to take the extra step of using the `open` function to create a file object from a file name.

### `os`

the `os` module has, basically, one goal: handle all the stuff that is different between different operating systems for you. 

the best example of this is file paths. suppose I want to create a file three directories below the current location: how do I write that path?

```bash
# in windows:
subdir1\subdir2\subdir3\myfile.txt

# in linux:
subdir1/subdir2/subdir3/myfile.txt
```

it'd be sad if such a dumb difference broke our script

in steps the `os` module:

In [None]:
import os
os.path.join('subdir1', 'subdir2', 'subdir3', 'myfile.txt')

my recommendation: never write a path in `python` again, ever, for any reason. always use `os.path`

the way that `os` joins those directories together is by using the `os.sep` character

In [None]:
os.sep

note that if we want to create a path relative to the root directory, then, we could do the following:

In [None]:
os.path.join(os.sep, 'tmp', 'myfile.txt')

another very useful part of the `os` module is the `environ` dictionary object, which is an OS-agnostic way of loading all of the environment variables:

In [None]:
os.environ

note: this is a `python` dictionary-like object:

In [None]:
os.environ['PWD']

there are a ton of other goodies in the `os` module, but you'll learn them in due time.

### `csv`

before `pandas` dataframes, there were lists of dictionaries:

```python
[
    {'col0': val00, 'col1': val10, 'col2': val20},
    {'col0': val01, 'col1': val11, 'col2': val21},
    {'col0': val02, 'col1': val12, 'col2': val22},
    {'col0': val03, 'col1': val13, 'col2': val23},
]
```

this is one `pythonic` way of representing a csv file: records as dictionaries, and key-value pairs corresponding to header field names and values.

the `csv` module (and specifically the `csv.DictReader` and `csv.DictWriter` objects) allow us to read and write csv files into this data structure

In [None]:
import csv

x = [
    {'a': 1, 'b': 2, 'c': 3},
    {'a': 100, 'b': 200, 'c': 300},
]

# I'll explain what this "with" thing is later
with open(os.path.join(os.sep, 'tmp', 'myfile.csv'), 'w') as f:
    c = csv.DictWriter(f, fieldnames=['a', 'b', 'c'])
    c.writeheader()
    c.writerows(x)

In [None]:
%%bash
less /tmp/myfile.csv

and now we could read it (or any csv) in:

In [None]:
# I'll explain what this "with" thing is later
with open(os.path.join(os.sep, 'tmp', 'myfile.csv'), 'r') as f:
    c = csv.DictReader(f)
    # note: c is just a special file object; you still need to iterate
    # through it all to get all the records!
    x = list(c)
    
x

an `OrderedDict` is a special class (from the `collections` module) which is simply a dictionary where the order of the keys is remembered

##### why should you ever do this?

generally speaking, you will probably want to read `csv` files in with `numpy`, `scipy`, or `pandas`. however, it is possible you might be in an environment where those are not made available to you.

first, ask yourself why you are acting as a data scientist but not allowed to use actual data scientist tools. then, remember that the answer *does* exist in the standard library, and see what you can figure out.

### context managers and the `with` statement

so what was the deal with

```python
with open(filename, 'r') as f:
    # blah blah
```

remember how I said that you *absolutely have to close file objects*?

note how I'm not doing that here?

a [*context manager*](https://docs.python.org/3.6/reference/datamodel.html#context-managers) is a syntactical construct (way of writing the code) such that

1. you create and rename some object
    1. the results of `open(filename, 'r')` are called `f`
2. you "enter" a context
    1. internally, the context object has an `__enter__` method which "does something"
        1. the file object is created
        2. a database connection is initialized
3. after you've done all the code in the indented block, you "exit" the context
    1. the context object has an `__exit__` method which "cleans up"
        1. the file object is closed
        2. the database cursor is executed, all transactions are wrapped up, and the database connection is closed

recommendation: *never* open a file with

```python
# bad way -- very bad, no! bad!
f = open(filename, 'r')

# do stuff

f.close()
```

but *always* use context managers

```python
# yaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
with open(filename, 'r') as f:
    # do stuff
```

this way you will *never forget* and if the context object (here, a file object) ever gets more complex or requires more clean up *you don't have to care*, and not having to care is the very heart of good programming.

## string formatting

you should read [this entire format string syntax page](https://docs.python.org/3.6/library/string.html#formatstrings).

the basic gist of it, though, is that there is that every string object `s` in python has a member function

```python
s.format(...)
```

and this can be used to replace elements within the string that are coded within `{}` characters. There is a large and highly flexible mini-language for doing this. for example

In [None]:
myname = 'zach'
print('hello {}, how are you today'.format(myname))

me = {
    'name': 'zach',
    'mood': 'groovy',
}
print('{name:} is feeling {mood:} today'.format(**me))

s = ' my title '
print('{:-^100}'.format(s))

string formatting

+ do the {} thing

plotting

+ if you're using matplotlib, just import seaborn
+ consider plotly

pandas

+ don't dismiss so quickly

iteration

+ itertools
+ list comprehension
+ generator expressions

### that `__main__` thing

at the very beginning I wrote a "better idea" version of a module file, and it ended with this block:

```python
if __name__ == '__main__':
    main()
```

can some one explain what is happening with that block?

as I said above, there are two types of `python` files: modules that provide functions for doing things, and scripts that acutally do things. 

modules all have a "name" member variable which is accessible via

```python
mymodule.__name__
```

(pronounced "mymodule dunder name").

for example:

In [None]:
import os
os.__name__

In [None]:
import logging.config
logging.config.__name__

the `__name__` variable value can be hard-coded to be something special within the source code of the module, but by default it is the same as the module name as it gets imported. so, if you wrote a `python` file `thisthing.py`, without making any change at all you would find that

```python
import thisthing
thisthing.__name__
```

would print the string 

```
'thisthing'
```

what's going on here is roughly the following:

1. the `python` interpreter sees that you want to `import thisthing`
2. it creates a "namespace" for `thisthing`
    1. a "namespace" is a segmented place where the contents of the `thisthing` can be put
        1. helps avoid naming conflicts
        2. is basically a big dictionary with "names" and the compiled objects they point to (like functions, values)
    2. a special variable `__name__` is created inside the `thisthing` module with a value `"thisthing"`
    3. all of the functions and values in `thisthing.py` are then executed and loaded into the `thisthing` namespace
    4. within the scope of `thisthing.py`, it is known that the "name" of their namespace is `thisthing`
3. all of the items in the `thisthing` module are then made available as `thisthing.SOME_ITEM`

so when the compiler goes to `import os`, it creates a namespace `'os'`, it creates a `__name__` value within that namespace, and it loads everything in `os`.

the end result is that there is now an object called `__name__` within the namespace `os`, aka

```python
os.__name__
```

there is one special name that doesn't correspond to a module -- that is the "script environment":

In [None]:
__name__

why the compiler starts for the first time, it's basically doing that some process without a module to `import`. it creates a *global* namespace, where everything the names of things are not prepended with anything. the `__name__` value is set to `__main__`

#### why does this matter, though?

so, given those two facts:

1. a general module, when `import`ed, will result in a module object with a `__name__` member variable equal to the string with which it was `import`ed: `thisthing.__name__`
2. the value of `__name__` in the global scope with value `__main__`

what does it mean to have a block

```python
# a bunch of code
# ...
# ...
# ...

if __name__ == "__main__":
    do_a_thing()
```

??

<div align="center">***YESSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS***</div>
<img align="middle" src="http://ih0.redbubble.net/image.13413141.8561/flat,550x550,075,f.u3.jpg"></img>

# END OF LECTURE

next lecture: [AWS identity access management (IAM)](005_iam.ipynb)