Dear Learner,

Welcome. This is a set of notebooks designed to teach you how to use (and hopefully understand) a software package called Geomstats. Geomstats is an open source code that uses Riemannian Geometry concepts to analyze data that lie on manifolds (this will be explained in further detail in the next section). Geomstats is the first software package of its kind. Before Geomstats, analysis of data on manifolds could only be done by people who could write their own software that could implement Riemannian geometry to analyze data on manifolds. The Geomstats package opens this field up and makes this type of data analysis more accessible to people who do not have prior knowledge of Riemannian Geometry. 

# Motivation for analyzing data on manifolds

what can data analysis on a manifold tell you that other types of data analysis cannot? what is the motivation to learn this type of data analysis?

Many data sets lie on a manifold, and if you analyze this data without considering its properties on this manifold, you are missing out on the power that this can give you????

Analyzing data on manifolds is advantageous for three reasons:

    1) Analyzing data on a manifold reduces the degrees of freedom of the system, making computations less complicated.
    2) Knowing the manifold that a data set belongs to can help you model the data, which gives you predictive power
    3) Knowing the manifold a data set lives in will help you extract the "signal" from a noisy data set or a data set with very few datapoints.

### 1
The number of Degrees of freedom a system has is the number of variables needed to describe the system completely. For example, an object moving freely in three dimensions will require three variables to describe it completely (x,y,z), ($\phi,\theta,$ r). If you can describe an object's motion in three dimensions, you would not want to use four variables to describe its motion because keeping track of another variable is mentally taxing (if you are solving a problem on paper) and more computationally expensive (if you are solving the problem with a computer). Therefore, if you had the ability to solve the problem using fewer degrees of freedom, you would want to do it. If you now know that this free particle is moving on the surface of a sphere, you would want to analyze the particle using two varibles $(\theta,\phi)$.

This is one of the major motivations behind using manifolds to analyze data

### 2
Objects travelling along a manifold often follow geodescics on that manifold. A geodescic is the shortest distance that a particle can travel in the space that it is in. For example, geodescics in 2D and 3D space follow straight lines because straight lines are the shortest way to get from one point to another.

(picture here) plane. point a to point b. also have a wandering line that shows something that is not the shortest distance.

However, when an object lies in a higher dimensional curved space, its geodescic will not follow a straight line. For example, if an object is constrained to move along the surface of a sphere, the shortest path between points is not a striaght line, but a curved line.

(show picture) sphere with two lines. one follows straight line along a sphere, the other follows geodescic

If you did not know that the object was moving along the surface of the sphere, you would wonder why it is taking such an irratic path instead of just going straight. The motion of the particles in your system might seem random because you do not understand the space they are moving in. However, if you learn more about the space they are moving in (the surface of a sphere), you would realize that the particles are following very reasonable and predictable paths along geodescics, and this would give you not only (bold this) a better understanding of how particles have moved in the past but also (bold this) predictive power to determine how particles will move in the future.

Similarly, if you know and understand the manifold that your data is moving in, you will be able to better understand how they are moving. When you do not consider the manifold they live in, their motion might seem irratic and random, but when you analyze their motion along a specific manifold, you might realize that their behavior is much more predictable and ordered than you previously thought. This will give you understanding of past behavior and predictive power for future behavior.

### 3
Knowing the manifold a data set lies on will help you extract the "signal" from a noisy data set or a data set with very few datapoints. 

Let's dissect a "noisy dataset" case. Let's say that you are measuring the position of a car moving at constant velocity, but you are measuring its position with very bad tools, and your data looks like this (noisy). 

(image here) noisy linear data

How can you get any information from this? It would be very difficult to get information from this if you don't have a model for what the data should (italicise) look like. But if you know that a car moving at constant velocity should follow the curve $x_f = x_i + v(\delta t)$, then you can get more information from your data by fitting your data to a line with slope v.

(image here) noisy linear data with best fit line

If you didnt know that your data SHOULD lie on a line, then you might try to fit it to a more complicated curve, or you might not have been able to extract any information at all. Similarly, for more complicated data sets, knowing the manifold your data lies on can help you extract information from noisy data.

Let's now dissect the "data set with very few data points" case, and let's again use the example of a car moving at constant velocity. Let's say you measured the initial position and initial time (point 1) and the final position and final time (point 2), and saw these two data points.

(picture) two data points

If you didnt know that the position of a car moving at constant velocity can be modelled by a line, you might not be able to accurately extrapolate the data beyond these two points. However, because you know that these two points should fall on a line, you can accurately predict where the car will be at a later time.

(picture) two data points, with extrapolation

Similarly, if you know the manifold that a data set lies on and you know the way in which data moves along that manifold, you can predict the trajectory of a data point along the manifold.

# What you will learn

Geomstats is designed to be intuative and user friendly, but having some knowledge about Riemannian Geometry will put you in a good position to understand how to use geomstats most effectively. Therefore, in the next three notebooks, we will give you an overview of three of the most important parent classes in Geomstats, along with a description of the matematical concepts that are implemented in each.

The three most important parent classes are:

    1) Manifold
    2) Connection
    3) RiemannianMetric
    
One instructional notebook will be dedicated to each of these parent classes, starting with Manifold. In each of these notebooks, you should expect to gain an understanding of

    1) the structure/hierarchy of the geomstats code and the class being discussed
    2) how to perform calculations on manifolds
    3) how and where this mathematics is implemented in the code

 # Beginning to build a hierarchal map

Now that we know about these three parent classes, we will begin to draw a hierarchal map of geomstats, which we will build out as we learn more about each parent class.

(put picture map here)

In the next notebook, we will discuss the manifold class.