# Python Essentials

This notebook will go through some Python essentials that are often overlooked but make our lives much easier when we're working with more advanced techniques.

We will be reviewing the following:

- [Data types: basic types](#basic-types)
- [Built in functions and methods](#func-and-meth)
- [Getting help on functions/methods](#getting-help)
- [Data types: containers](#containers)
    - [Lists](#lists)
    - [Tuples](#tuples)
    - [Dictionaries](#dicts)
- [Naming Conventions - best practice](#naming-vars)

<a id='basic-types'></a> 

## Data types: basic types

Variables can store data of different types, and different types can do different things. 

You can check the type of any variable/type using type().

Python has data types built-in by default:

Numeric:
* `int`: integers AKA *whole numbers*, e.g. `0`, `1`, `2`, `3`, ...
* `float`: floating point numbers AKA *numbers with decimals*, e.g. `0.1234`, `1.2354`, 
* `bool`: booleans AKA things that can either be `True` or `False`

Non-numeric/text:
* `str`: strings AKA *text*, e.g. `"hello, this is a string"` 

**Basic data type examples**

![numbers-2.png](attachment:numbers-2.png)

<a id='func-and-meth'></a>
## Built in functions & methods

As well as data types, Python has built in methods and functions. No one expects you to remember them all, which is why Larry Page and Sergey Brin gave us Google, but here are a few you should memorise as they are well-use and will save time if you know them:

* Convert data types: `float()`, `int()`, `str()`, `bool()`, `list()`, `dict()`


* Change strings to upper/lower case: `str.upper()`, `str.lower()`


* Split a string into a list of smaller strings: `str.split()`


* Return the length of an object: `len()`


* Generate a range which can be used in a loop: `range()`

In [1]:
dr_seuss = "I do not like green eggs and ham. I do not like them Sam-I-am."
release_year = '1960.0'
copies_sold = 8000000 

In [2]:
float(copies_sold), list(release_year)

(8000000.0, ['1', '9', '6', '0', '.', '0'])

In [3]:
dr_seuss.upper(), dr_seuss.lower()

('I DO NOT LIKE GREEN EGGS AND HAM. I DO NOT LIKE THEM SAM-I-AM.',
 'i do not like green eggs and ham. i do not like them sam-i-am.')

In [4]:
print(dr_seuss.split(), '\n\n', dr_seuss.split('.'))

['I', 'do', 'not', 'like', 'green', 'eggs', 'and', 'ham.', 'I', 'do', 'not', 'like', 'them', 'Sam-I-am.'] 

 ['I do not like green eggs and ham', ' I do not like them Sam-I-am', '']


In [5]:
print(f"""
Number of characters: {len(dr_seuss)} 
    
Number of words: {len(dr_seuss.split())}
""")


Number of characters: 62 
    
Number of words: 14



<a id='getting-help'></a>
## Getting help on functions/methods

You can get help on any function/method by calling `help()` (or in IPython using `?`):

In [6]:
help(str.upper)

Help on method_descriptor:

upper(self, /)
    Return a copy of the string converted to uppercase.



![len-help.png](attachment:len-help.png)

Or print the `__doc__` attribute of the function:

In [8]:
print(str.split.__doc__)

Return a list of the words in the string, using sep as the delimiter string.

  sep
    The delimiter according which to split the string.
    None (the default value) means split according to any whitespace,
    and discard empty strings from the result.
  maxsplit
    Maximum number of splits to do.
    -1 (the default value) means no limit.


<a id='containers'></a>
## Containers

Containers allow us to collect and group variables together.

<a id='lists'></a> 
## Containers: Lists

The values in a list are called items or sometimes elements.

The important properties of Python lists are as follows:

- ***Lists are ordered*** – Lists remember the order of items inserted.
- ***Accessed by index*** – Items in a list can be accessed using an index.
- ***Lists can contain any sort of object*** – It can be numbers, strings, tuples and even other lists.
- ***Lists are changeable (mutable)*** – You can change a list in-place, add new items, and delete or update existing items.

### Indexing

When indexing (counting) the items in a list, you start from 0, this is the same in most programming languages. 

You can also count backwards, since the first item is 0, the last item goes back to -1:

```python
          days_of_week = [ 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun' ]

forward indexing:            0      1      2      3      4      5      6
    
backward indexing:          -7     -6     -5     -4     -3     -2     -1
````

Let's take a look at some examples:

In [14]:
ingredients_soup = ['parsnip','garlic','ginger','apple','leek','yellow beetroot','parsley']

In [15]:
ingredients_soup[0], ingredients_soup[-1]

('parsnip', 'parsley')

In [16]:
(ingredients_soup[4],ingredients_soup.index('leek'))

('leek', 4)

In [17]:
ingredients_soup[2:5]

['ginger', 'apple', 'leek']

In [18]:
ingredients_soup[:7:2]

['parsnip', 'ginger', 'leek', 'parsley']

In [19]:
ingredients_soup[:-6:-1]

['parsley', 'yellow beetroot', 'leek', 'apple', 'ginger']

### Looping through a list

It's possible to loop through lists using a for loop or list comprehension:

In [20]:
for item in ingredients_soup:
    if item.startswith('p'):
        print(item.upper())
    else:
        print(item[::-1])

PARSNIP
cilrag
regnig
elppa
keel
toorteeb wolley
PARSLEY


### List comprehensions

Are great to save lines of code & are more efficient in run time

![list-comp-if.png](attachment:list-comp-if.png)

In [21]:
[item for item in ingredients_soup if len(item.split()) > 1]

['yellow beetroot']

![list-comp-if-else.png](attachment:list-comp-if-else.png)

In [22]:
ingredients_risotto = ['parsley', 'onion', 'arborio rice', 'milk', 'garlic', 'lemon', 'mushroom']

In [23]:
[item.upper() if item in ingredients_risotto else item for item in ingredients_soup]

['parsnip', 'GARLIC', 'ginger', 'apple', 'leek', 'yellow beetroot', 'PARSLEY']

### `enumerate()` 
**<font color='brown'>Readability counts.</font>**

Take care when looping through lists, avoid unpythonic (and C-style) coding like:

In [36]:
[(i, ingredients_soup[i]) for i in range(len(ingredients_soup))]

[(0, 'parsnip'),
 (1, 'garlic'),
 (2, 'ginger'),
 (3, 'apple'),
 (4, 'leek'),
 (5, 'yellow beetroot'),
 (6, 'parsley')]

Instead opt for the more Pythonic way and use `enumerate()`:

In [48]:
[(index, item) for index, item in enumerate(ingredients_soup)]

[(0, 'parsnip'),
 (1, 'garlic'),
 (2, 'ginger'),
 (3, 'apple'),
 (4, 'leek'),
 (5, 'yellow beetroot'),
 (6, 'parsley')]

<a id='tuples'></a>
## Containers: Tuples

Tuples are a lot like [lists](#lists) except for the last point:

- ***Tuples are immutable*** – you can’t add, delete, or change items after the tuple is defined.

**So where you can alter a list:**

```python
>> scores = [34, 54, 2, 20, 10]
# add 3 to the list
>> scores.append(3)
>> scores
[34, 54, 2, 20, 10, 3]

# remove 20 from the list
>> scores.remove(20)
>> scores
[34, 54, 2, 10, 3]
```

---
**You can't do this with a tuple:**
```python
>> scores_tup = (34, 54, 2, 20, 10)
>> scores_tup.append(3)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-755c8cefcf95> in <module>
      1 scores = (34, 54, 2, 20, 10)
----> 2 scores.append(3)
      3 scores

AttributeError: 'tuple' object has no attribute 'append'
```
---
**Unless you forcefully overwrite the variable:**

```python
>> scores_tup = (34, 54, 2, 20, 10)
>> scores_tup = scores_tup + (3,)
>> scores_tup
(34, 54, 2, 20, 10, 3)
```

<a id='dicts'></a>
## Containers: Dictonaries
Unordered* collection of **keys**-**values** pairs.

![dict-key-value.png](attachment:dict-key-value.png)

**<font color='brown'>Explicit is better than implicit.</font>**

*Since Python 3.7 dictionaries are now ordered as a language feature. In 3.6 this is only an implementation detail and in any lower versions is untrue.

**If being ordered is important to your implementation**, the best way to create a new dictionary is to use `OrderedDict()` until Python versions of 3.6 or lower are no longer in use.

There are two ways to update a dictonary: 

```python
dict['new_key'] = 'new_value' 
# or
dict.update({'new_key':'new_value'})
```

In [24]:
from collections import OrderedDict

ordered_info = OrderedDict({'name': 'Joe Bloggs',
                            'scores': [45, 20, 54, 32, 90],
                            'email': 'joebloggs@net.com'})
ordered_info['age'] = 54
ordered_info

OrderedDict([('name', 'Joe Bloggs'),
             ('scores', [45, 20, 54, 32, 90]),
             ('email', 'joebloggs@net.com'),
             ('age', 54)])

In [25]:
info = {'name': 'Joe Bloggs',
        'scores': [45, 20, 54, 32, 90],
        'email': 'joebloggs@net.com'}

info.update({'age': 54})  
info

{'name': 'Joe Bloggs',
 'scores': [45, 20, 54, 32, 90],
 'email': 'joebloggs@net.com',
 'age': 54}

### Looping through a dictionary

You can access the keys/values using:
- `dict.items()`: Both keys and values
- `dict.keys()`: Just keys
- `dict.values()`: Just values

In [26]:
[(key, val) for key, val in info.items()]

[('name', 'Joe Bloggs'),
 ('scores', [45, 20, 54, 32, 90]),
 ('email', 'joebloggs@net.com'),
 ('age', 54)]

In [45]:
[key.upper() for key in info.keys()]

['NAME', 'SCORES', 'EMAIL', 'AGE']

In [44]:
[val for val in info.values() if not str(val).isdigit()]

['Joe Bloggs', [45, 20, 54, 32, 90], 'joebloggs@net.com']

### Dictionary comprehensions

We can use comprehension the same as when we did this in lists. Just take care of the syntax

![dict-comp-if.png](attachment:dict-comp-if.png)

In [29]:
{index: item for index, item in enumerate(ingredients_soup)}

{0: 'parsnip',
 1: 'garlic',
 2: 'ginger',
 3: 'apple',
 4: 'leek',
 5: 'yellow beetroot',
 6: 'parsley'}

![dict-comp-if-else.png](attachment:dict-comp-if-else.png)

In [30]:
ingredients_risotto

['parsley', 'onion', 'arborio rice', 'milk', 'garlic', 'lemon', 'mushroom']

In [31]:
amounts_risotto = ['small bunch', 1, '300g', '125ml', '3 bulbs', 2, '500g']

In [38]:
{ing: amt if str(amt).isdigit() else amt.upper() for ing, amt in zip(ingredients_risotto, amounts_risotto)}

{'parsley': 'SMALL BUNCH',
 'onion': 1,
 'arborio rice': '300G',
 'milk': '125ML',
 'garlic': '3 BULBS',
 'lemon': 2,
 'mushroom': '500G'}

### Collections

We've already seen `OrderedDict()` but there are others, let's look at `Counter`:

In [47]:
from collections import Counter

count_words_dr_seuss = Counter(dr_seuss.split())
count_words_dr_seuss

Counter({'I': 2,
         'do': 2,
         'not': 2,
         'like': 2,
         'green': 1,
         'eggs': 1,
         'and': 1,
         'ham.': 1,
         'them': 1,
         'Sam-I-am.': 1})

In [34]:
# the the n most common words
count_words_dr_seuss.most_common(2)

[('I', 2), ('do', 2)]

In [35]:
most_common_letter = Counter(dr_seuss.replace(' ','')).most_common(1)
most_common_letter

[('e', 6)]

<a id='naming-vars'></a>
## Naming Conventions

A variable lets you store information/data and you can assign a variable using `=`. The following are simple rules that we can (and should!) follow when naming variables (and anything we have the intention of naming really!!)

1. **<font color='brown'>Use Intention-Revealing Names</font>**
    - Choosing good names takes time but saves more than it takes
   

   
2. **<font color='brown'>Avoid Disinformation</font>**
    - We should avoid words whose entrenched meanings vary from our intended meaning
    

3. **<font color='brown'>Use Searchable Names</font>**
    - Single-letter names are not easy to locate across a body of text. The length of a name should correspond to the size of its scope; though never `l` or `0`!
    

    
4. **<font color='brown'>Don't be cute (even when it's very tempting)</font>**
    - If names are too clever, they will be memorable only to people who share the author’s sense of humor, and only as long as these people remember the joke
    

***What is wrong with the following?***

In [11]:
a = ['apple','banana','clementine','fig']

In [10]:
list = ['pawn','rook','knight','bishop','queen','king']

In [9]:
import pandas as pd

# 
df = pd.DataFrame({'Harry': ['O','E','E','E','E','E','A'],
                   'Hermione': ['E','O','O','O','O','O','O'],
                  'Ron': ['E','E','E','E','E','E','A']},
                 index = ['DAtDA', 'CoMC', 'Chms', 'Herb', 'Pot', 'Tfig', 'Astr'])

In [12]:
def kamikazee(df, cols):
    return df.drop(cols, axis=columns)

***Conclusion***

The hardest thing about choosing good names is that it requires **good descriptive skills** and a **shared cultural background**. This isn't a technical, business, or management issue. As a result many people in this field don’t learn to do it very well.

***If we all checked-in our code a little cleaner than when we checked it out, the code simply could not rot.***

Don't be afraid to make or suggest a change if you think of a name that's better! 