<a href="https://colab.research.google.com/github/infiniterik/python-tips/blob/main/PythonFeatures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basics of python notebooks

## Output
By default, a notebook cell will output the last value it recieves.

In [1]:
'Hello, world'

'Hello, world'

It will also perform out any explicit print statements.

In [2]:
print('Hello')
'world' # Note that 'hello' doesn't have quotes, but 'world' does

Hello


'world'

If your code generates an image as the last line of the cell or explicitly displays an image, it will also show that.

In [3]:
from IPython.display import Image
Image(url= "https://www.knox.edu/images/Offices/Communications/KnoxLogo/Knoxhorizpurple.jpg")

There are many libraries that provide "widgets" for python notebooks. One such library is `matplotlib` which can generate graphs and charts on the fly.

## Installing and Importing Libraries

In [4]:
# Install libraries from the pypi repository
!pip install sklearn

Collecting datasets
  Downloading datasets-1.18.3-py3-none-any.whl (311 kB)
[K     |████████████████████████████████| 311 kB 22.5 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 40.0 MB/s 
Collecting xxhash
  Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)
[K     |████████████████████████████████| 243 kB 21.6 MB/s 
[?25hCollecting fsspec[http]>=2021.05.0
  Downloading fsspec-2022.1.0-py3-none-any.whl (133 kB)
[K     |████████████████████████████████| 133 kB 6.3 MB/s 
Collecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 20.8 MB/s 
Collecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 4.4 MB/s 
Collecting sacremoses
  Downl

In [5]:
# Import a library
import nltk # Some libraries are pre-installed in the Colab environment

# Import just part of a library
from nltk import corpus
from nltk.corpus import wordnet

# You can also rename an import using the "as" keyword
from nltk.corpus import wordnet as wn

## Out-of-order execution
Cells can be executed in any order. imports, variables, and functions all persist across cell executions. This can be useful for trying things out quickly - for example, if your code from a particular cell crashes, you can fix that cell and continue running. However, this means that you can lose track of the _state_ of your code. For example, if you ran the following three cells:

In [6]:
# A
mutation = 3

In [7]:
# B
mutation += 1

In [8]:
# C
mutation

4

You would get a different result if you ran `A->B->B-C` than if you just ran `A->B->C`. Unless you reset the entire notebook, the variable `mutation` will continue to exist after you run `A`.

Since notebooks naturally encourage exploration, you might find yourself creating a cell to 'try out' a line of code only to delete it when you are done. Keep in mind that anything that you do in those cells may have a continuing impact after the cell is closed. It is a good idea to ensure that you can rerun your entire notebook by going to `Runtime->Restart and run all`.

# String Formatting

Most standard/library types in Python print out a useful representation when directly passed to the `print(...)` function. For example:

In [9]:
print([1,2,3])

[1, 2, 3]


You can also print multiple things in one line, such as a string and another object:

In [10]:
print("My list is", [1,2,3])

My list is [1, 2, 3]


Sometimes, you may want to create more complex formatted strings for more readable output. One way to do this is using f-strings



In [11]:
name = "World"
print(f'Hello, {name}')

print(f'You can evaluate expressions in f-strings, too: {1}, {3*5}')
print(f"Even call functions: {len(name)}")

Hello, World
You can evaluate expressions in f-strings, too: 1, 15
Even call functions: 5


This replaces each instance of `{}` in your f-string with the formatted expression inside the `{}`. You can provide more complex directives inside the `{}`. Formatting can get quite complicated, but there are a few particularly useful pieces of syntax.

The data type allows you to make conversions, for example, between `int` and `char` types:

In [12]:
print(f'{68:c}') # convert an int to a char
print(f'{68:f}') # convert an int to a float

D
68.000000


You can also truncate or pad your outputs to fit nicely:

In [13]:
print(f'{1/3:.2f}') # limit to 2 decimal places
print('{1:0>2d}')  # or pad with a character

for i in range(12):
  print(f'{i:0>2d} aligned') # keep outputs aligned for easy reading

0.33
{1:0>2d}
00 aligned
01 aligned
02 aligned
03 aligned
04 aligned
05 aligned
06 aligned
07 aligned
08 aligned
09 aligned
10 aligned
11 aligned


### How to translate instructions regarding `.format()` to f-strings
f-strings are relatively new, so a lot of guides refer to `.format()` instead. You can use most of the formatting directives from `.format()` in f-strings. You can translate `'{:.2f}'.format(num)` to an f-string by moving the variable name into the `{}`-expression: `f'{num:.2f}'` generates the same string.

# Basic iteration and Destructuring

## `range()`

In [14]:
# The basic for loop uses range() in python
lst = ["hello", "world", "!"]

for i in range(len(lst)):
  print(lst[i])

hello
world
!


`range(n)` actually creates an iterable which has each of the numbers from `0` to `n`. `range()` can take 1, 2, or 3 arguments with varying results

In [15]:
for i in range(5):
  print(i)

0
1
2
3
4


In [16]:
# take values between 3 and 5
for i in range(3,5):
  print(i)

3
4


In [17]:
# take every second value between 0 and 5
for i in range(0,5,2):
  print(i)

0
2
4


In [18]:
# you can iterate over any collection
for i in lst:
  print(i)

hello
world
!


## `enumerate()`

In [19]:
# You can iterate over both keys and values

for elements in enumerate(lst):
  print(elements[0], elements[1])

0 hello
1 world
2 !


## Unpacking

In [20]:
# Or use "unpacking" to break up lists/tuples

for index, value in enumerate(lst):
  print(index, value)

0 hello
1 world
2 !


In [21]:
# You can unpack in any assignment expression
a, b = (1,2)
print(a, b)

1 2


# Lambdas

Lambdas are a compact way to define a function inline.

In [22]:
foo = lambda x: 3*x
foo(3)

9

Lambdas can access global variables, but any variable and take 0 or more arguments, just like any other function.

In [23]:
g = 3
bar = lambda : g
bar()

3

Lambdas can be particularly useful for small functions that you pass as an argument to other functions.

# Collections

## Sets

Python has a built-in `set` class which provides an unordered collection with constant-time membership checks. Sets remove duplicates.

In [24]:
# Convert a list into a set
set([1,2,3,3,4,5])

{1, 2, 3, 4, 5}

## Dictionaries

Dictionaries store key/value pairs with constant time lookup for keys. Each key must be unique. If a repeat key is added, the original value is overwritten.

### Creating a dictionary

In [25]:
d = {'a': 1, 'c' : 2}

Positional arguments are automatically converted to strings

In [26]:
d2 = dict(a=1, c=2)

### Iterating over dictionaries

You can iterate over just the keys in a dictionary

In [27]:
for k in d:
  print(k)

a
c


Or, more explicitly

In [28]:
for k in d.values():
  print(k)

1
2


Or just the values in a dictionary

In [29]:
for v in d.values():
  print(v)

1
2


Or both keys AND values simultaneously

In [30]:
for k, v in d.items():
  print(k, "->", v)

a -> 1
c -> 2


### Notes about keys

Keys in dictionaries must have a `__hash__` function. This function should return an `int` with the constraint that any two items of a type that are equal should have the same hash value.

If you try to add a key without a defined `__hash__` function, you will get an error:

In [31]:
d[set()] = 3 # will not work

TypeError: ignored

A regular dictionary
will raise a `KeyError` if you try to access a key that is not present:

In [32]:
d['b'] # will not work

KeyError: ignored

The immediate solution is to wrap this in an if statement:

In [33]:
if 'b' in d:
  print(d['b'])
else:
  print("'b' not found")

'b' not found


or use `.get()`


In [34]:
d.get('b', -1) # the second argument is the backoff value, can be None

-1

## Convenient advanced collections

### Default Dictionary

Sometimes, you want to create a key in a dictionary when you query it. 

defaultdicts are a wrapper around the basic `dict` type which allows you to define a default value for unseen keys. Every time a new key is queried, if it doesnt exist, the provided function is run to retrieve the default value.

In [35]:
from collections import defaultdict as ddict

default = ddict(lambda: -1)
default['a'] = 1
default['c'] = 2
default['a']

1

In [36]:
default['b']

-1

In [37]:
default['c']

2

In [38]:
default.keys()

dict_keys(['a', 'c', 'b'])

Your `defaultdict` will create an entry for the key when you query using `d[query]`. If you do not want this behavior, you should use `d.get()` instead.

### Counters

Counters are a wrapper around the basic `dict` type to facilitate quickly counting large numbers of things. They have a default value of `0` for any key and allow quickly adding keys and values using the `+` operator.

In [39]:
from collections import Counter

first = Counter()
first['a'] # the count for `a` initializes to 0

0

In [40]:
first.keys() # simply querying the Counter did not add the key

dict_keys([])

In [41]:
first['a'] = 0 # now we have set a value
first

Counter({'a': 0})

In [42]:
first['b'] += 1 # you can increment an unseen key
first

Counter({'a': 0, 'b': 1})

You can create Counters in a variety of ways.

In [43]:
second = Counter(['b', 'c'])
second

Counter({'b': 1, 'c': 1})

In [44]:
third = Counter({'a': 1, 'b': 1})
third

Counter({'a': 1, 'b': 1})

Creating a counter from a dictionary can have unexpected results.

In [45]:
fourth = Counter({'a': "hello", 'b': []})
fourth

Counter({'a': 'hello', 'b': []})

You can add values into counters

In [46]:
first['a'] += 2
first

Counter({'a': 2, 'b': 1})

Adding two counters together creates a new counter

In [47]:
first + second

Counter({'a': 2, 'b': 2, 'c': 1})

but leaves the originals unchanged

In [48]:
first, second

(Counter({'a': 2, 'b': 1}), Counter({'b': 1, 'c': 1}))

You can also add values from other collections in bulk using `update()`

In [49]:
third.update(['a', 'b', 'c'])
third

Counter({'a': 2, 'b': 2, 'c': 1})

You can query Counters for the `most_common()` values, sorted in descending order.

In [50]:
third.most_common()

[('a', 2), ('b', 2), ('c', 1)]

Or limit yourself to a few of them


In [51]:
third.most_common(2)

[('a', 2), ('b', 2)]

### NamedTuple

You will often find yourself passing data around in python using `tuples` rather than writing a complete class. This can get very confusing very quickly. While you may not want to write a whole class, you may want to reference elements in your tuple by name.

In [52]:
from collections import namedtuple

# Define your namedtuple
Point = namedtuple('Point', ['x', 'y']) 

# create a new Point
p = Point(1,2)

# access elements of the Point by index or name
print(p[0] == p.x)

True


In [53]:
# namedtuples also have sane string representations
print(p)

Point(x=1, y=2)


One advantage of `namedtuple` over classes is that you can still use unpacking to split up the elements of your `namedtuple`.

In [54]:
x, y = p
print(f'({x}, {y}) <-> {p}')

(1, 2) <-> Point(x=1, y=2)


A `namedtuple` can be directly compared to a regular `tuple` and is equal if it has the same values in the same order.

In [55]:
print((1,2) == p)

True


# List Comprehensions

## Basic List Comprehensions

In [56]:
lst = [12,23,142,12,45,5,187]
# You can index from the beginning or the end of a list

print(lst[1])
print(lst[-1])

23
187


In [57]:
# You can take a 'slice' of a list

print(lst[:3]) # first 3 elements of lst
print(lst[-3:]) # last 3 elements of lst

[12, 23, 142]
[45, 5, 187]


In [58]:
# lst[a:b] takes the elements between index a and b

print(lst[2:5])

[142, 12, 45]


In [59]:
# You can take every nth element of a list
print(lst[::2])
print(lst[1:6:2])

[12, 142, 45, 187]
[23, 12, 5]


Slicing allows you to do the same types of operations as `range()`, which means you can directly address the elements of the collection you are interested in without the additional layer of indices if you don't need the indices.

## Advanced List Comprehensions

Create a list using a for loop in one line:

In [60]:
[3*i for i in range(3)]

[0, 3, 6]

Add a conditional to your loop:

In [61]:
[i for i in range(20) if i % 2 == 1]

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

You can nest for loops in a single line:

In [62]:
# The last for loop is the outer loop
[(i, j) for i in range(5) for j in range(5,10)]

[(0, 5),
 (0, 6),
 (0, 7),
 (0, 8),
 (0, 9),
 (1, 5),
 (1, 6),
 (1, 7),
 (1, 8),
 (1, 9),
 (2, 5),
 (2, 6),
 (2, 7),
 (2, 8),
 (2, 9),
 (3, 5),
 (3, 6),
 (3, 7),
 (3, 8),
 (3, 9),
 (4, 5),
 (4, 6),
 (4, 7),
 (4, 8),
 (4, 9)]

You can also create a dictionary in a single line:

In [63]:
{i: i*5 for i in range(12)}

{0: 0,
 1: 5,
 2: 10,
 3: 15,
 4: 20,
 5: 25,
 6: 30,
 7: 35,
 8: 40,
 9: 45,
 10: 50,
 11: 55}

Or, a set:

In [64]:
{i%5 for i in range(1232)}

{0, 1, 2, 3, 4}

# max, min, sort, sum

## max and min

The functions `max()` and `min()` can run on any iterable value. When the elements have a natural ordering function, `max()` and `min()` automatically return the expected values.

In [65]:
max([1,23,94,2])

94

In [66]:
min([1,23,94,2])

1

However, some types (e.g. non-scalar types such as `list` or `dict`) have natural ordering functions. In addition, there are situations where you may wish to maximize or minimize by a different condition. In these situations, you can use the `key` argument. The `key` argument takes a function of one argument and returns an orderable value.

In [67]:
names = ["Alfred", "Margaret", "Dipika"]

# since len is already a function of one argument, we can pass it in directly
max(names, key=len) 

'Margaret'

In [68]:
# but lambdas are great for this purpose
max(names, key=lambda x: len(x))

'Margaret'

## Sorting

Sorting works much the same way. There are two primary ways to sort.**bold text**

In [69]:
# Will return a sorted copy of the list
sorted(names) # The default sort for strings is alphabetic

['Alfred', 'Dipika', 'Margaret']

In [70]:
# will sort the list in place
names.sort(key=len)
names

['Alfred', 'Dipika', 'Margaret']

Note that the default sort is in ascending order. You can either provide a key to reverse the sort or call the `reversed()` or `.reverse()` function.

## Sum and join

`sum()` takes the elements of a collection and adds them together.

In [71]:
sum([1,2,3,4,5])

15

The `+` operator has different meanings for different lists. For example:

In [72]:
sum([[1], [2]], []) # the second argument tells sum which starting value to use

[1, 2]

`sum()` doesn't work for strings. Instead, we have `.join()` for strings. `.join()` is a function called on a string. If you call `a.join(b)`, the elements of `b` are put together into a single string separated by instances of `a`.

In [73]:
", ".join(names)

'Alfred, Dipika, Margaret'

# Forms and progress bars

In [74]:
#@title Example form fields
#@markdown Forms support many types of fields.

no_type_checking = ''  #@param
string_type = 'example'  #@param {type: "string"}
slider_value = 131  #@param {type: "slider", min: 100, max: 200}
number = 102  #@param {type: "number"}
date = '2010-11-20'  #@param {type: "date"}
pick_me = "monday"  #@param ['monday', 'tuesday', 'wednesday', 'thursday']
select_or_input = "apples" #@param ["apples", "bananas", "oranges"] {allow-input: true}
#@markdown ---
#@markdown these values can be used in code
#@markdown
#@markdown If the value is changed in the form, it overrides the value in the code cell

print(date)

2010-11-20


In [75]:
# You can easily create a progress bar using tqdm
from time import sleep
from tqdm.notebook import tqdm

for i in tqdm([1,2,3,4,5]):
  print(i)
  sleep(1)

  0%|          | 0/5 [00:00<?, ?it/s]

1
2
3
4
5
