# Quickstart

This tutorial is intended to serve as a quick guide on how to use the **Garpar** system.


## Interactive Version

Launch Binder for an interactive version of this tutorial!

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/quatrope/garpar/HEAD?filepath=%2Fdocs%2Fsource%2Ftutorial.ipynb)

## Imports

There are two important modules in **Garpar**:

- `optimize` is a subpackage of garpar that allows the application of optimization models.
- `datasets` is a subpackage of garpar that allows you to simulate markets with different parameters.

In [1]:
from garpar import datasets, optimize

## The StocksSet class

Most of the time the system will interact between modules with the `StocksSet` class. This has the following representation:

In [8]:
datasets.make_risso_normal()

Stocks,"S0[W 1.0, H 0.5]","S1[W 1.0, H 0.5]","S2[W 1.0, H 0.5]","S3[W 1.0, H 0.5]","S4[W 1.0, H 0.5]","S5[W 1.0, H 0.5]","S6[W 1.0, H 0.5]","S7[W 1.0, H 0.5]","S8[W 1.0, H 0.5]","S9[W 1.0, H 0.5]"
Days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1,100.183198,99.932585,100.142072,99.933852,100.114148,100.253151,100.271272,100.517361,100.250841,99.796565
2,100.385626,99.764453,99.900055,99.723824,100.230444,100.110850,100.590582,100.611480,100.613890,99.707641
3,100.356632,100.121197,99.816914,99.760569,100.601390,100.186639,100.573449,100.692344,100.637941,99.710578
4,100.498137,99.917107,99.984346,99.668109,100.824156,100.021424,100.677015,100.886673,100.675017,99.455994
...,...,...,...,...,...,...,...,...,...,...
361,105.077697,99.273832,104.678911,91.828727,105.449323,97.254607,97.429498,98.256551,102.270445,99.241396
362,104.941461,99.222518,104.915777,92.070253,105.505055,97.146057,97.336044,98.473633,102.017598,99.290740
363,105.222544,99.598494,104.899396,91.942721,105.702557,96.931186,97.464649,97.820726,101.725805,99.169302
364,105.137702,99.780072,104.605639,92.229847,105.879823,97.013228,97.530581,97.589438,101.826549,99.290081


The definition requires some context to fully understand. For now, think of it as a combination of a **market** and a **portfolio**. Each stock has a corresponding price, and there are weights, denoted by `W`, alongside each `S0`, `S1`, ..., `S9`. These weights represent the percentage of the budget allocated to the stock they are attached to.

There are in-depht guides of both `optimize` and `datasets`. In this guide we will only scratch the surface with basic examples of each one.

For this tutorial we will simulate a market and apply an optimization model. With that we will conclude the quickstart.

We can simulate a market by calling the function `make_risso_normal` inside `datasets`, this will make a simulation with some default parameters. For consistency we will use only one, `random_state`, 

In [10]:
ss = datasets.make_risso_normal(random_state=23)
ss

Stocks,"S0[W 1.0, H 0.5]","S1[W 1.0, H 0.5]","S2[W 1.0, H 0.5]","S3[W 1.0, H 0.5]","S4[W 1.0, H 0.5]","S5[W 1.0, H 0.5]","S6[W 1.0, H 0.5]","S7[W 1.0, H 0.5]","S8[W 1.0, H 0.5]","S9[W 1.0, H 0.5]"
Days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1,100.233936,100.126354,100.056170,100.479517,99.721932,100.231624,100.454794,100.363444,100.381520,99.850608
2,100.239643,99.982188,99.987262,100.478348,99.664800,100.226893,100.457623,100.091836,100.758777,99.705687
3,100.619143,100.057865,99.839579,100.528209,99.469437,99.982587,100.372239,100.224793,100.651979,99.700380
4,100.654832,100.278884,99.729389,100.543373,99.308332,100.188572,100.684870,99.963426,100.492926,99.320826
...,...,...,...,...,...,...,...,...,...,...
361,99.510508,101.112485,104.388966,100.728362,106.476547,100.745743,101.622384,97.238908,102.661988,97.320508
362,99.513739,101.224681,104.222790,100.589587,106.637348,100.936895,102.013803,97.519343,102.848153,97.376270
363,99.903102,101.276511,104.231903,100.804729,106.589381,100.957299,102.060671,97.372291,102.612551,97.047446
364,99.909110,101.197052,104.360405,100.949056,106.400243,100.870183,102.212970,97.143356,102.827934,97.182677


Now that we have a `StocksSet` instance, lets apply an optimization model and see how the weights change.

In [12]:
mk = optimize.mean_variance.Markowitz(target_risk=0.01)
mk.optimize(ss)

Stocks,"S0[W 0.092830, H 0.5]","S1[W 0.084328, H 0.5]","S2[W 0.128854, H 0.5]","S3[W 0.081310, H 0.5]","S4[W 0.154233, H 0.5]","S5[W 0.106193, H 0.5]","S6[W 0.122956, H 0.5]","S7[W 0.046754, H 0.5]","S8[W 0.124695, H 0.5]","S9[W 0.057846, H 0.5]"
Days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1,100.233936,100.126354,100.056170,100.479517,99.721932,100.231624,100.454794,100.363444,100.381520,99.850608
2,100.239643,99.982188,99.987262,100.478348,99.664800,100.226893,100.457623,100.091836,100.758777,99.705687
3,100.619143,100.057865,99.839579,100.528209,99.469437,99.982587,100.372239,100.224793,100.651979,99.700380
4,100.654832,100.278884,99.729389,100.543373,99.308332,100.188572,100.684870,99.963426,100.492926,99.320826
...,...,...,...,...,...,...,...,...,...,...
361,99.510508,101.112485,104.388966,100.728362,106.476547,100.745743,101.622384,97.238908,102.661988,97.320508
362,99.513739,101.224681,104.222790,100.589587,106.637348,100.936895,102.013803,97.519343,102.848153,97.376270
363,99.903102,101.276511,104.231903,100.804729,106.589381,100.957299,102.060671,97.372291,102.612551,97.047446
364,99.909110,101.197052,104.360405,100.949056,106.400243,100.870183,102.212970,97.143356,102.827934,97.182677


This concludes with the quickstart. Simulation and optimization are the key concepts of the system.

### The Galaxy Class

We will create a stripped-down version of the Galaxy class from the [Galaxy-Chop](https://github.com/vcristiani/galaxy-chop) project.

It will have only 8 attributes. The first 7 will have units attached and will be implemented with `uttr.ib`.
These are:

* `x`, `y`, `z`: The postions of the particles (typically stars) from the center of the galaxy measured in KiloParsecs ($kpc$).
* `vx`, `vy`,  `vz`: The relative velocity components of the particles measured in $km/s$.
* `m`: Masses of the particles in units of solar masses ($M_\odot$).

The last attribute `notes` is a description text about the galaxy and can be implemented with the standar *attrs* library.

In [3]:
@uttr.s
class Galaxy:
    x = uttr.ib(unit=u.kpc)
    y = uttr.ib(unit=u.kpc)
    z = uttr.ib(unit=u.kpc)

    vx = uttr.ib(unit=u.km / u.s)
    vy = uttr.ib(unit=u.km / u.s)
    vz = uttr.ib(unit=u.km / u.s)

    m = uttr.ib(unit=u.M_sun)

    notes = attr.ib(validator=attr.validators.instance_of(str))

NameError: name 'uttr' is not defined

### Galaxy with Default Units

Now that we created our class, we can go ahead and create an object of type *Galaxy*.

To keep it simple, let's assume only 4 particles with totally arbitrary numbers on each attribute.

Part of *uttrs* power is its ability to assign default units when not provided, or to validate that the input unit is physically compatible with the given default.

Let's see first an example in which all units are assigned automatically.

In [None]:
gal = Galaxy(
    x=[1, 1, 3, 4],
    y=[10, 2, 3, 100],
    z=[1, 1, 1, 1],
    vx=[1000, 1023, 2346, 1334],
    vy=[9956, 833, 954, 1024],
    vz=[1253, 956, 1054, 3568],
    m=[200, 100, 20, 5],
    notes="A random galaxy with arbitrary numbers.",
)

Let's verify that all attributes of the class were given the correct units.

In [None]:
gal.x

<Quantity [1., 1., 3., 4.] kpc>

In [None]:
gal.y

<Quantity [ 10.,   2.,   3., 100.] kpc>

In [None]:
gal.vx

<Quantity [1000., 1023., 2346., 1334.] km / s>

In [None]:
gal.m

<Quantity [200., 100.,  20.,   5.] solMass>

In [None]:
gal.notes

'A random galaxy with arbitrary numbers.'

### Galaxy with Explicit Units

A different alternative is to provide units compatible with the default unit.
In this case, we have to be mindful of the phyisical equivalence of units with the ones given at the time the class was created.

For example, we could suggest that the dimension `z` be given in parsecs, `vy` in $km/h$ and masses in $kg$.

In [None]:
gal = Galaxy(
    x=[1, 1, 3, 4],
    y=[10, 2, 3, 100],
    z=[1000, 1000, 1000, 1000] * u.parsec,
    vx=[1000, 1023, 2346, 1334],
    vy=[9956, 833, 954, 1024] * (u.km / u.h),
    vz=[1253, 956, 1054, 3568],
    m=[200, 100, 20, 5] * u.kg,
    notes="A random galaxy with arbitrary numbers.",
)

As we note above, this works as expected without error.
We can further access any of the attributes and verify that they keep the suggested units.

In [None]:
gal.z  # parsecs

<Quantity [1000., 1000., 1000., 1000.] pc>

In [None]:
gal.m  # kg

<Quantity [200., 100.,  20.,   5.] kg>

In [None]:
gal.vx  # default km/s

<Quantity [1000., 1023., 2346., 1334.] km / s>

In [None]:
gal.vy  # km/h

<Quantity [9956.,  833.,  954., 1024.] km / h>

On the other hand, if we try to input a unit that is incompatible with the suggested input unit, a `ValueError` exception is raised.

To show this, let's try to assign `x` values with units of grams ($g$).

In [None]:
gal = Galaxy(
    x=[1, 1, 3, 4] * u.g,
    y=[10, 2, 3, 100],
    z=[1000, 1000, 1000, 1000] * u.parsec,
    vx=[1000, 1023, 2346, 1334],
    vy=[9956, 833, 954, 1024] * (u.km / u.h),
    vz=[1253, 956, 1054, 3568],
    m=[200, 100, 20, 5] * u.kg,
    notes="A random galaxy with arbitrary numbers.",
)

ValueError: Unit of attribute 'x' must be equivalent to 'kpc'. Found 'g'.

## Automatic Cohersion of Units: Array Accessor

One powerful feauture of *uttrs* is the ability to easily transform all units to plain `numpy.ndarray`, using the default units.

This is achieved using the `uttr.array_accessor()` function.
This allows for uniform access of attributes defined by uttrs, in a data structure that has faster access time than its counterpart with units.

By default the `@uttr.s` automataclly add an array accessor to decorated class. You can disabled this functionallity using the decorator like
`@uttr.s(aaccessor=None)`, or change the name of the property with `@uttr.s(aaccessor="other_name")`.


Expanding on the previous example:

In [None]:
@uttr.s
class Galaxy:
    x = uttr.ib(unit=u.kpc)
    y = uttr.ib(unit=u.kpc)
    z = uttr.ib(unit=u.kpc)

    vx = uttr.ib(unit=u.km / u.s)
    vy = uttr.ib(unit=u.km / u.s)
    vz = uttr.ib(unit=u.km / u.s)

    m = uttr.ib(unit=u.M_sun)

    notes = attr.ib(validator=attr.validators.instance_of(str))

Let's instantiate the class again with some parameters with custom units.

In [None]:
gal = Galaxy(
    x=[1, 1, 3, 4],
    y=[10, 2, 3, 100],
    z=[1000, 1000, 1000, 1000] * u.parsec,
    vx=[1000, 1023, 2346, 1334],
    vy=[9956, 833, 954, 1024] * (u.km / u.h),
    vz=[1253, 956, 1054, 3568],
    m=[200, 100, 20, 5] * u.kg,
    notes="A random galaxy with arbitrary numbers.",
)

If we now access `z` through our `arr_` accessor, *uttrs* will convert the values in parsec units to kiloparsecs and return a uniform numpy array.

In [None]:
gal.arr_.z

While `z` keeps its original unit.

In [None]:
gal.arr_.z

The same applies to `vy` and `m`.

In [None]:
gal.arr_.m

In [None]:
gal.arr_.vy

If we try to access a private attribute not from `uttr.ib`, an `AttributeError` exception is raised.

In [None]:
gal.arr_.notes

## Using the `array_accessor`

It is a known issue that Astropy units can slow down complex computations.

To avoid this, developers usually choose to uniformize units and convert the values to numpy arrays to operate on them faster; reverting back to values with units at the end of the calculation.

As a helper, `array_accesor` will perform the transformation in a transparent way to the user, avoiding the need to replicate information regarding units.

For example, if we wanted to program code that generates a new Galaxy object with a single particle that is the average mean of all the rest, we could do something like this:

In [None]:
@uttr.s
class Galaxy:
    x = uttr.ib(unit=u.kpc)
    y = uttr.ib(unit=u.kpc)
    z = uttr.ib(unit=u.kpc)

    vx = uttr.ib(unit=u.km / u.s)
    vy = uttr.ib(unit=u.km / u.s)
    vz = uttr.ib(unit=u.km / u.s)

    m = uttr.ib(unit=u.M_sun)

    notes = attr.ib(validator=attr.validators.instance_of(str))

    def mean(self):
        x = np.mean(self.arr_.x)
        y = np.mean(self.arr_.y)
        z = np.mean(self.arr_.z)

        vx = np.mean(self.arr_.vx)
        vy = np.mean(self.arr_.vy)
        vz = np.mean(self.arr_.vz)

        m = np.mean(self.arr_.m)

        return Galaxy(
            x=x, y=y, z=z, vx=vx, vy=vy, vz=vz, m=m, notes=self.notes
        )

We could now create a galaxy with 1 million random elements and calculate the "average" galaxy.

In [None]:
import numpy as np

# Fix random seed
random = np.random.default_rng(seed=42)

size = 1_000_000

gal = Galaxy(
    x=random.random(size=size),
    y=random.random(size=size),
    z=random.random(size=size) * u.parsec,
    vx=random.random(size=size),
    vy=random.random(size=size),
    vz=random.random(size=size) * (u.km / u.h),
    m=random.random(size=size) * u.kg,
    notes="A random galaxy with arbitrary numbers.",
)

In [None]:
gal.mean()

To complete the example, let's see how would a `mean` method look like without `array_accessor`.

In [None]:
@uttr.s(aaccessor=None)
class Galaxy:
    x = uttr.ib(unit=u.kpc)
    y = uttr.ib(unit=u.kpc)
    z = uttr.ib(unit=u.kpc)

    vx = uttr.ib(unit=u.km / u.s)
    vy = uttr.ib(unit=u.km / u.s)
    vz = uttr.ib(unit=u.km / u.s)

    m = uttr.ib(unit=u.M_sun)

    notes = attr.ib(validator=attr.validators.instance_of(str))

    def mean(self):
        x = np.mean(self.x.to_value(u.kpc))
        y = np.mean(self.y.to_value(u.kpc))
        z = np.mean(self.z.to_value(u.kpc))

        vx = np.mean(self.vx.to_value(u.km / u.s))
        vy = np.mean(self.vy.to_value(u.km / u.s))
        vz = np.mean(self.vz.to_value(u.km / u.s))

        m = np.mean(self.m.to_value(u.M_sun))

        return Galaxy(
            x=x, y=y, z=z, vx=vx, vy=vy, vz=vz, m=m, notes=self.notes
        )

In [None]:
import datetime as dt

dt.date.today().isoformat()