# Lecture 1, Data science in Neuroscience


If you don't have Jupyter Lab running on your computer, you can try running the notebooks of the course in Binder. 

Visit https://tinyurl.com/3pyz9mpb

It will take a few minutes to load. More on this later.

## Today's plan

1. Overview of the course
2. Assessment of previous experience
3. Data science landscape
4. Introduction to Python
5. Numpy and Matplotlib
6. Import data (electrophysiological data recorded from the hippocampus of a mouse)


*** 
# Overview

What is data science?

Turn raw data into understanding, insight and knowledge.

For a neuroscientist, this is what typically follows data acquisition.

![dataScience](../images/data-science.png)

Image from "R for Data science" by Wickham and Grolemund

## Course objectives

1. Identify the principal tools available in data science (focus on tools for Python)
2. Use Jupiter notebook to mix code, figures, and text.
3. Learn how to use NumPy arrays.
4. Visualize your data using Matplotlib.
5. Understand what machine learning is and identify its principal advantages and challenges.
6. Apply a machine-learning algorithm to test a hypothesis in neuroscience.
7. Train a deep neural network to track moving objects in a video.

## Course online repository

The content of the course is in a GitHub repository.

https://github.com/kevin-allen/dataScienceNeuro

You can download all the files in a .zip file and you will have access to all the notebooks used in the lectures. 


I typically update the notebooks on Thursday morning. You can download the notebooks around 12:00 am to get an up-to-date version. 

## Lecture format

1. Introduction of the main topic
2. Exercises
3. Review of exercises
4. Exercises/readings/videos between lectures to consolidate new skills
5. Review Jupyter notebooks at least once to make the new knowledge stick.

Feel free to ask me a question whenever you like.

## Running your own Jupyter server

I encourage you to run your own Jupyter server on your computer. That is the traditional way.

If you don't have a Jupyter server or a copy of the course repository on your computer, you can open the notebooks using __Binder__.

It will take 2-5 minutes, and you can run most of the code.

Try this short link: https://tinyurl.com/3pyz9mpb

Longer form:https://mybinder.org/v2/gh/kevin-allen/dataScienceNeuro/HEAD

## Supporting materials

### [DataCamp website](https://app.datacamp.com/learn)

The Datacamp platform is very suitable for learning about programming in R and Python for instance but there is much more.  There is an educational account that you can use for the Winter semester 2023/24. Please use your real name to log in, and only stud.uni-heidelberg.de addresses are allowed entry, so use that when you set up an account.

https://www.datacamp.com/groups/shared_links/0d6a358a583e62b80edcff774cd9a182e1ee18d1dc9d8b555ee9147adbce056f

### [Python for Data Analysis (book)](https://wesmckinney.com/book/)

https://wesmckinney.com/book/

More to be shared along the way.


## Small data science project


At the end of the course, each student (alone or in a team of 2 students) chooses a small data analysis project to present on 08.12.2023.

The project will be presented as a Jupyter Notebook. 

1. Identify the spatial coding properties of neurons.
2. Using a deep neural network to track the position of a moving object (mouse, hand, etc) in a video.
3. More to come



***

## Previous experience

A short survey of the classroom.

https://forms.gle/X8UzvjUmJGTsUyDD8


***

## Data science landscape

### Matlab

#### Pros

* Good signal processing toolbox
* Good at matrix operations

#### Cons
* Proprietary programming platform
* Expensive (yearly license)
* People have to buy a license to run your code
  
### R

#### Pros
* Open-source language
* Great for traditional statistical analysis and tabular data

#### Cons
* Less popular than Python in Neuroscience
* Lacking tools for image and signal processing

### Python

#### Pros
* Open-source language
* Most popular data science language in academia and industry
* Great for signal processing (NumPy, SciPy) or image processing (openCV)
* The best machine learning toolboxes (Scikit-Learn, TensorFlow, PyTorch, etc)

#### Cons
* Conflicts between different versions of packages!
* Virtual environments might be confusing at first

***

# Introduction to Python

Python is an interpreted language. The Python interpreter runs one line after the other. A Jupyter notebook uses a Python interpreter to run the code. 

When you execute the content of a Jupyter Notebook cell, the content is sent to the Python interpreter.

We can run Python code directly in this notebook and we can display the results of calculations.

Use shift+enter to run the code (or click on Run/Run Selected Cells).

In [2]:
4+10

14

You can create variables when you need them.

In [3]:
myFirstVariable = 8

In [6]:
myFirstVariable

8

In [7]:
mySecondVariable = 9

In [8]:
myFirstVariable * mySecondVariable

72

## Automatic or tab completion

You can use tab completion to let the Python interpreter suggest names for variables. Just type the first characters and press `tab`.

In [9]:
myFirstVariable = 10

Tab completion can also be used to complete the name of files located on your computer.

Let's try to look for a file in the course repository.

In [8]:
"/home/kevin/Dowloads"

'/home/kevin/repo/'

## Comments
Everything starting with # is a comment and will not be processed by the Python interpreter.

In [10]:
# This is a comment, the next line is printing Hello
print("Hello")

Hello


In [11]:
print("Hello") # comment on the same line as the code

Hello


## Built-in types

Scalars (single values) that are part of the Python language. You can use `type()` to know the type of these objects

In [12]:
myVar = None 
myString = "Hey you"
myNumberInt = 2
myNumberFloat =2.4
myBool = True

type(myVar),type(myString),type(myNumberInt),type(myNumberFloat),type(myBool)

(NoneType, str, int, float, bool)

## Introspection

You can use the question mark to know more about your object. 

In [13]:
?myFirstVariable

[0;31mType:[0m        int
[0;31mString form:[0m 10
[0;31mDocstring:[0m  
int([x]) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is a number, return x.__int__().  For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base.  The literal can be preceded by '+' or '-' and be surrounded
by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4

## List [...]

Lists are a very important structure in python. They are a list of elements.

In [15]:
myList = [1,2,3,4,"a"]
myList

[1, 2, 3, 4, 'a']

If you want to make sure that this is a list, use `type()`.

In [16]:
type(myList)

list

Use tab to know what attributes and methods are available for an object.

In [18]:
myList.

In [19]:
myList

[1, 2, 3, 4, 'a', 6]

In [20]:
myList.append(["f","z"])
myList

[1, 2, 3, 4, 'a', 6, ['f', 'z']]

In [21]:
[1,2,3]+[2,3]

[1, 2, 3, 2, 3]

To lean what a method is doing, use online documentation or ?

In [20]:
?list.append

[0;31mSignature:[0m [0mlist[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      method_descriptor

**Indexing** can be used to access a single element in a list.

In [23]:
myList[0]

1

**Slice** a list to get only part of it.

In [24]:
myList

[1, 2, 3, 4, 'a', 6, ['f', 'z']]

We need to put a range withing square brackets. 

In [25]:
myList[0:2]

[1, 2]

The index starts at 0 in Python. The first element is element 0.

In [26]:
myList[2:4]

[3, 4]

The first index is included but not the last one.

You can give the last element by counting from the last using the minus sign.

In [27]:
myList[0:-2]

[1, 2, 3, 4, 'a']

In [31]:
myList[-1]

['f', 'z']

## Tuple (...)

A tuple is a group of element. It can't be changed.

In [34]:
myTupple = (1,2,3,"hey")

In [35]:
myTupple[2]

3

In [36]:
myTupple[2] = 5

TypeError: 'tuple' object does not support item assignment

In [29]:
myTupple

(1, 2, 3, 'hey')

## Dictionary {...}

Dictionary are used to store data values in **key:value** pairs.

In [37]:
myDict = {"names": ["Luke","Stephany","Peter"],
          "grades": [89,95,92]}

In [38]:
myDict

{'names': ['Luke', 'Stephany', 'Peter'], 'grades': [89, 95, 92]}

In [39]:
myDict["names"]

['Luke', 'Stephany', 'Peter']

In [40]:
myDict.keys()

dict_keys(['names', 'grades'])

***
##  Variables are references to object

Variables are just names that point to objects in the computer's memory.

When assigning a variable in Python, you are creating a reference to the object on the righthand side of the equal sign.


In [41]:
a = [1,2,3] # a is a reference for our list.
print("Initial values for a:",a)
b = a # b becomes a reference to our list 
b[0] = 4 # Change the object that b points to. 
print("New values for a:", a)

Initial values for a: [1, 2, 3]
New values for a: [4, 2, 3]


In [43]:
a[0] = 9

In [44]:
b

[9, 2, 3]

***

## Control flow

### if, elif and else

In [45]:
myVariable = 10
if myVariable > 5:
   print("myVariable is larger than 5")
else:
   print("myVariable is equal to or smaller than 5")

myVariable is larger than 5


### for loops

In [46]:
for i in [0,1,2,3,4]:
    print(i)

0
1
2
3
4


In [51]:
for i in range(2,5):
    print(i)

2
3
4


In [49]:
for i,letter in enumerate(["a","b","c","d"]):
    print(i,letter)

0 a
1 b
2 c
3 d


### while loops

In [50]:
i = 0
while(i < 5):
    print(i)
    i = i+1

0
1
2
3
4


***

## List comprehensions [... for ...]

A list comprehension has the logic of a for loop, and creates a list

It uses squared brackets just like the list, but contains a for loop

In [39]:
aList = [ i for i in range(5)]

In [40]:
type(aList)

list

In [41]:
aList

[0, 1, 2, 3, 4]

In [42]:
[ (i+5)*2 for i in range(5)]

[10, 12, 14, 16, 18]

***
## User-defined functions

Useful when: 
* You write a few lines of code and want to use it repeatedly
* You want to hide the details from the end user (many lines of code are executed after one line of code)

In [43]:
def printMyAddress():
    print("Lucie Smith")
    print("Untere Str. 5-7")
    print("69117 Heidleberg")
    print("Germany")

In [44]:
printMyAddress()

Lucie Smith
Untere Str. 5-7
69117 Heidleberg
Germany


In [54]:
def complexMathFunction(x):
    return (x**2)+17/2

In [56]:
complexMathFunction(x=3)

17.5

***
# Introduction to NumPy

![NumPy](../images/NumPy.png)

[https://numpy.org/](https://numpy.org/)

* Numpy provides N-dimension arrays. NumPy provides containers for your data.
* NumPy provides many math functions.
* Written in C (very fast). 


## Why do you need NumPy?

* Your `raw` data will often be in Numpy arrays. Essential to process your raw data.
* Many statistical or machine learning packages use NumPy arrays.
* Most plotting packages expect Numpy arrays as inputs.

We will work with NumPy arrays throughout this course. 

For a more complete introduction: 
* [https://numpy.org/devdocs/user/quickstart.html](https://numpy.org/devdocs/user/quickstart.html)
* [https://numpy.org/doc/stable/user/absolute_beginners.html](https://numpy.org/doc/stable/user/absolute_beginners.html)

Since it is not part of Python, you need to import the NumPy package before using it.

In [59]:
import numpy as np

## NumPy Arrays

You can think of NumPy arrays as very powerful Excel sheets.


1D array

$$
\begin{aligned}
&\begin{array}{cccc}
1 & 2 & 3 & 4 \\
\end{array}
\end{aligned}
$$

2D array

$$
\begin{aligned}
&\begin{array}{cccc}
1 & 2 & 3 & 4 \\
5 & 6 & 7 & 8 \\
9 & 10 & 11 & 12 \\
\end{array}
\end{aligned}
$$

Create a 1D NumPy array with `np.array()` and a List.

In [64]:
a = np.array([1,2,3,4])
print(a)
print(type(a))

[1 2 3 4]
<class 'numpy.ndarray'>


You can create arrays filled with 0s, 1s, or with a sequence of numbers.

In [70]:
bunchOfZeroes = np.zeros(4)
print(bunchOfZeroes)
bunchOfOnes = np.ones(4)
print(bunchOfOnes)
d = np.arange(4) # sequence
print(d)

[0. 0. 0. 0.]
[1. 1. 1. 1.]
[0 1 2 3]


In [73]:
bunchOfOnes.dtype
d.dtype

dtype('int64')

You can do math operations with these arrays. 

In [68]:
a = np.arange(4)
print(a+1)
print(a+d)
print(a**2) # a to the power of 2

[1 2 3 4]
[0 2 4 6]
[0 1 4 9]


In [66]:
d

NameError: name 'd' is not defined

## Inspect a NumPy array

Knowing how to get information about your NumPy array is essential to fix problems.

In [69]:
print(a)
print("type:",type(a))
print("ndim:",a.ndim)
print("shape:",a.shape)
print("size:",a.size)
print("dtype:",a.dtype)

[0 1 2 3]
type: <class 'numpy.ndarray'>
ndim: 1
shape: (4,)
size: 4
dtype: int64


## 2D arrays

2D arrays have rows and columns. You can think of them as an images.

* rows are horizontal
* columns are vertical

In [74]:
e = np.array([[1,2],[3,4]])
print("ndim:", e.ndim)
print("shape:",e.shape)
print(e)

ndim: 2
shape: (2, 2)
[[1 2]
 [3 4]]


In [75]:
np.arange(24)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [76]:
f = np.arange(24).reshape(6,4) # reshape change the 1D array created by np.arange into a 2D array
print(f)
print("ndim:", f.ndim)
print("shape:",f.shape)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
ndim: 2
shape: (6, 4)


You can add also add/substract/multiply/divide 2D arrays.

In [77]:
g = np.ones_like(f)
g

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [78]:
f+g # add one to f

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

In [79]:
f+1 # there are easier ways to add one to all elements on an array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

## Slicing an array

You will often need to get only a fraction of an array. To do so, use `[]`.


In [80]:
a = np.arange(10)
print(a)
a[0:4] # you get the data at position 0 until position 3. You don't get 4.

[0 1 2 3 4 5 6 7 8 9]


array([0, 1, 2, 3])

In [81]:
a[:4] # you can omit 0

array([0, 1, 2, 3])

In [82]:
a[:] # you can omit start and end to get everything

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [83]:
a[0:-1] # with -, it starts from the end, -1 remove the last number

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [84]:
a[-1] # gives you the last number

np.int64(9)

Slicing also work for 2D or ND arrays.

In 2D, the order is `[rows you want,columns you want]`. A comma separates the indices of each dimension.

In [86]:
f = np.arange(12).reshape(3,4)
f

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [87]:
f[:,0] # put a comma between indices of each dimension

array([0, 4, 8])

In [88]:
f[0:2,:]

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

## Built-in math functions in NumPy

There are many useful math functions in NumPy to work with arrays.

[https://numpy.org/doc/stable/reference/routines.math.html](https://numpy.org/doc/stable/reference/routines.math.html)

[https://numpy.org/doc/stable/reference/routines.statistics.html](https://numpy.org/doc/stable/reference/routines.statistics.html)

* sin(), cos()
* sum(), mean(), std()
* diff()
* ...

In [89]:
f = np.arange(12).reshape(2,6)
f

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

You can for example use np.mean() to calculate the mean. 

In [90]:
f.mean() # or np.mean(f)

np.float64(5.5)

You can calculate the mean arcros rows using `axis=0`.

In [91]:
myMean = f.mean(axis=0)
myMean

array([3., 4., 5., 6., 7., 8.])

You can calculate the mean arcros columns using `axis=1`.

In [92]:
f.mean(axis=1)

array([2.5, 8.5])

We will see many more functions during the next few weeks.

## Broadcasting

This is a __complexe topic__ but it is very __powerful!__

Learning how broadcasting works will simplify and speed up your python code!

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

[https://numpy.org/doc/stable/user/basics.broadcasting.html](https://numpy.org/doc/stable/user/basics.broadcasting.html)



The simplest broadcasting example occurs when an array and a scalar value are combined in an operation. 

In [1]:
import numpy as np

In [2]:
a = np.arange(10)
print(a)
a+100 # 100 is broadcast to all elements of a

[0 1 2 3 4 5 6 7 8 9]


array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

```
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

* they are equal, or
* one of them is 1
```

In [4]:
a = np.arange(10).reshape(2,5)
b = np.ones(5)
print("shape of a:",a.shape)
print(a)
print("")
print("shape of b:",b.shape)
print(b)
print("")
print("a+b:")
a+b

shape of a: (2, 5)
[[0 1 2 3 4]
 [5 6 7 8 9]]

shape of b: (5,)
[1. 1. 1. 1. 1.]

a+b:


array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.]])

In this case 5 and 5 were compatible and it worked

Let's have another go with new arrays.

In [5]:
a = np.arange(10).reshape(2,5)
b = np.ones(6)
print("shape of a:",a.shape)
print(a)
print("")
print("shape of b:",b.shape)
print(b)
print("")
print("a+b:")
a+b

shape of a: (2, 5)
[[0 1 2 3 4]
 [5 6 7 8 9]]

shape of b: (6,)
[1. 1. 1. 1. 1. 1.]

a+b:


ValueError: operands could not be broadcast together with shapes (2,5) (6,) 

Let's try to remove the average of each row of a from each row of a.

In [13]:
print("shape of a:",a.shape)
print(a)

shape of a: (2, 5)
[[0 1 2 3 4]
 [5 6 7 8 9]]


In [14]:
a.mean(axis=1)

array([2., 7.])

In [15]:
b = a.mean(axis=1) # calculate the mean of each row
print("shape of b:",b.shape)
b

shape of b: (2,)


array([2., 7.])

Will `a-b` work?

In [16]:
a-b

ValueError: operands could not be broadcast together with shapes (2,5) (2,) 

We can't broadcast b over a because 5 and 2 are not compatible

Remember the rule above. 

```
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

* they are equal, or
* one of them is 1
```


In this case, we can add a dimension of 1 to b. 

In [22]:
b = np.expand_dims(b,axis=1) # add a dimension of size 1 
print(b.shape)
b

(2, 1, 1, 1)


array([[[[2.]]],


       [[[7.]]]])

The shape of a is (2,5) and the shape of b is (2,1). The dimensions are now compatible (5 and 1; 2 and 2).

In [21]:
b.shape

(2, 1, 1)

In [18]:
a-b

array([[-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.]])

We have successfully removed the mean of each row from each row.

__Tipp: When writing code to process large arrays (real data), write your code with a small hand-made arrays like in this section. It is faster and easier to debug.__