# Projects for day 3

## General structure for all projects

Below, we give you some ideas for more advanced projects python. In all these cases we would like you to do all of the following tasks:

#### Basic tasks

- Discuss beforehand the structure of the code that you want to write.
- Use git to keep track of your collaborative project, and upload it to github to share with your partner.
- structure your code into modules (i.e. have functions etc. in files separate from your notebooks that use it).
- Add tests checking your code.
- Work independently on different parts of the code (i.e. on different functions, or test/function).
- Introduce an error in your code that does is not captured by the test, and let your partner debug the code to find the error. Improve the test to also cover this error.

#### Additional tasks

- Make your code a package that you can install globally on your system.

## Extract information from text

Make a module to extract information from some text stored in a string. Possible functionality could be:

- count number of words/letters in a text.
- count how many times a word accurs in a text.
- find all occurences of a word in a text, and output the word together with it's surrounding. For example, output 5 words before or after, or the whole sentence.
- Be creative!
- The text can be also provided through a filename, or a URL. Adapt your code to accept different sources of text.

## Peak finder

Make a module for searching and fitting for resonant peaks in noisy data.

- Your procedure needs to be resistant against noise, think how you are going to find the peaks.
- Implement generating mock signal so that you can test your procedure systematically.
- How would you estimate the error in the fit results?
- Does your procedure work if the points are not measured homogeneously?
- What about finding several peaks (an amount not known in advance)?

## Latex converter

Implement a script that tracks all the latex files in one folder and compiles them into pdf files in another folder. It should run persistently, so that when a new file is added, modified, or an old one is removed, the pdfs are automatically updated. You will need to install latex for this: run `sudo apt-get install texlive` in terminal. In terminal compiling a latex file into pdf is done by just running `pdflatex mydocument.tex`.

In this project you'll need to call terminal programs from Python. Watching for updates in a folder can be done in many ways, but the easiest is probably just to check for changes every couple of seconds.

## Image compression

(Useful project if you are already familiar with what a 'SVD' is.)

Images can easily be represented as numpy arrays in python (for example, using ``matplotlib.image.imread`` for loading PNG files). A simple to implement compression scheme for this data is based on the singular value decomposition (SVD): A $N\times N$-matrix $A$ can be decomposed as $A = U S V^\dagger$ where $U$ and $V$ are $N\times N$ unitary matrices, and $S$ is a $N\times N$ diagonal matrix with positive entries $s_i$ on the diagonal.

If we take only the largest $M\ll N$ entries $s_i$, and set the remaining $s_i$ to zero, we get an approximation for $A$: $A \approx \tilde{U} \tilde{S} \tilde{V}^\dagger$, where
$\tilde{U}$ and $\tilde{V}$ are now $N\times M$ matrices (the first $M$ columns of $U$ and $V$), and $\tilde{S}$ a $M\times M$ matrix with the largest $s_i$ on the diagonal. But if $M \ll N$, we now need much less information to approximately store the image, and hence we compressed it.

Write a module for compressing images, writing compressed images to a file, reading it again and displaying the image on the screen.

## Analyse arXiv data


Make use of arXiv [api](http://arxiv.org/help/api/index#python_simple_example) to do simple visualisations:

1. count how many publications with word **novel** in title (abstract) appears each day (month, or even year) and plot it. Compare with a word **revisit**. How often do these two appear together?
2. make histogram of lengths (amount of words) of abstracts from every paper where your supervisor is one of authors

Advice:
* search for information about ``feedparser.parse``, it may be useful
