Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Foundaml is a service that enables machine learning predictions to be stored, predicted and associated to their labels.
Data is the core problem of machine learning. Foundaml helps you to manage your machine learning pipeline and to develop successful machine learning projects.

[Getting started](https://foundaml.github.io/server/)

## Predict
Foundaml does not execute algorithms on its own. It needs to be paired with other software, such as TensorFlow Serving, to be able to generate predictions.

Expand Down
12 changes: 6 additions & 6 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ lazy val root = (project in file("."))
micrositeName := "FoundaML",
micrositeDescription := "Pipeline for machine learning algorithms",
micrositeAuthor := "FoundaML contributors",
micrositeOrganizationHomepage := "https://github.com/antoinesauray/foundaml-server",
micrositeGitterChannelUrl := "antoinesauray/foundaml-server",
micrositeGithubOwner := "antoinesauray",
micrositeGithubRepo := "foundaml-server",
micrositeOrganizationHomepage := "https://github.com/foundaml/server",
micrositeGitterChannelUrl := "foundaml/server",
micrositeGithubOwner := "foundaml",
micrositeGithubRepo := "server",
micrositeFavicons := Seq(
microsites.MicrositeFavicon("favicon.png", "512x512")
),
micrositeUrl := "https://antoinesauray.github.io",
micrositeBaseUrl := "/foundaml-server"
micrositeUrl := "https://foundaml.github.io",
micrositeBaseUrl := "/server"
)
.enablePlugins(MicrositesPlugin)
Binary file added src/main/resources/microsite/img/Foundaml.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 43 additions & 1 deletion src/main/tut/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,48 @@ title: "Getting Started"

# Getting Started

FoundaML will help you industrialize your machine learning projects. It sits between your clients (web apps, mobile apps etc) and your algorithms (heuristics or machine learning).

# Not ready yet
![hello](img/Foundaml.png)

FoundaML has four key concepts.

### Projects
A project is a set of algorithms working on the same data.
The principle here is that you can compare and switch algorithms only if they operate on the same data.

When you use FoundaML, you begin by defining your project with the data that it will work on and the objective it will pursue. For the moment, FoundaML supports the following types of problems.

* Classification

FoundaML supports a set of generic features that you can combine to build any algorithm, those include.

* Double or Float
* Integer
* String

### Algorithms

An algorithm is code, running on another instance, that computes values from data. Algorithms in the same project will work on the same data. if necessary, they can reprocess it in a pre processing pipeline ([An example with Tensorflow Transform](https://github.com/tensorflow/transform)). This can be useful if you need to normalize your data or you need to try different word embeddings on your NLP problem.


FoundaML can implement various backends. Currently, FoundaML supports the following APIs.

* TensorFlow Serving API

To perform the transformation from the project features to the algorithm input features, FoundaML needs **features transformers**. The same operation is required when converting the algorithm labels to the project labels, using **label transformers**.


### Predictions

A prediction belongs to a project and an algorithm. The algorithm that is executed depends on the project policy (if not specified explicitely by the client). You can choose between various policies available.

* **No Algorithm** (By default, a project will deny predictions until an algorithm is created)
* **DefaultAlgorithm** means executing the same algorithm all the time
* **RoundRobin** allows you to specify weights for each algorithms that you created in your projects. This is helpful for AB testing.


### Examples
Each prediction comes with a set of urls that allow you to tag it as correct or incorrect. This will help you generate a labeled dataset as well as evaluate your algorithm in real time.

Let's now move on to the [Titanic Example](https://foundaml.github.io/server/the_titanic.html)
270 changes: 270 additions & 0 deletions src/main/tut/the_titanic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
---
layout: page
position: 3
section: home
title: "Example: The Titanic"
---

# The Titanic (Work in progress)
The Kaggle Titanic challenge is a popular Kaggle contest. It will serve as a good example to teach you how to use FoundaML to solve this problem.

# Creating the project
To solve this problem, we need to list the data on which our algorithms will perform. The features look as follows.

* PassengerId
* Survived
* Pclass
* Name
* Sex
* Age
* SibSp
* Parch
* Ticket
* Fare
* Cabin
* Embarked

The algorithm should predict whether or not the person survived. This is a classification problem. We can now create the project with a curl request.

```
curl -X POST \
http://localhost:8080/projects/ \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{
"id": "kaggle-titanic",
"name": "Kaggle Titanic",
"configuration": {
"problem": {
"class": "Classification"
},
"features": {
"featuresClasses": [
{
"name": "passengerId",
"featureClass": "IntFeature",
"description": "The unique identifier of the passenger"
},
{
"name": "pClass",
"featureClass": "IntFeature",
"description": "Class of travel"
},
{
"name": "name",
"featureClass": "StringFeature",
"description": "Name of passenger"
},
{
"name": "sex",
"featureClass": "StringFeature",
"description": "Gender"
},
{
"name": "age",
"featureClass": "IntFeature",
"description": "Age"
},
{
"name": "sibSp",
"featureClass": "IntFeature",
"description": "Number of Sibling/Spouse aboard"
},
{
"name": "pArch",
"featureClass": "IntFeature",
"description": "Number of Parent/Child aboard"
},
{
"name": "ticket",
"featureClass": "StringFeature",
"description": "The ticket identifier"
},
{
"name": "fare",
"featureClass": "StringFeature",
"description": "Which fare"
},
{
"name": "cabin",
"featureClass": "StringFeature",
"description": "Which cabin"
},
{
"name": "embarked",
"featureClass": "StringFeature",
"description": "The port in which a passenger has embarked. C - Cherbourg, S - Southampton, Q = Queenstown"
}
]
},
"labels": [
"survived",
"notSurvived"
]
}
}'
```

# Our first algorithm, a simple heuristic
Our first algorithm will be quite simple. It will be this [simple heuristic](https://github.com/foundaml/titanic-heuristic).
It does not really matter at that point what we compute.

We will use the [TensorFlow Serving API](https://www.tensorflow.org/tfx/serving/api_rest) for our algorithm. I suggest you read about it before you continue this example.

### Features transformation
This json object represents the mapping between our project features and the Tensorflow Serving API.

```
"featuresTransformer": {
"signatureName": "",
"fields": [
"passenger_id",
"p_class",
"name",
"sex",
"age",
"sib_sp",
"p_arch",
"ticket",
"fare",
"cabin",
"embarked"
]
}
```
### Labels transformation
It is possible that the output of the algorithm does not exactly match the output of our project. We can define a transformation that maps one to the other.
```
"labelsTransformer": {
"fields": {
"survived": "survived",
"did_not_survived": "notSurvived"
}
}
```
The keys of the ``field`` object are the outputs of the algorithm. Their value is the label of the project that we want to map it to.

## Adding our algorithm to the project
We can summarize the information above with this http query.

```
curl -X POST \
http://localhost:8080/algorithms/ \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '
{
"id": "tf-kaggle-titanic-1",
"projectId": "kaggle-titanic",
"backend": {
"class": "TensorFlowBackend",
"host": "127.0.0.1",
"port": 3000,
"featuresTransformer": {
"signatureName": "",
"fields": [
"passenger_id",
"p_class",
"name",
"sex",
"age",
"sib_sp",
"p_arch",
"ticket",
"fare",
"cabin",
"embarked"
]
},
"labelsTransformer": {
"fields": {
"survived": "survived",
"did_not_survived": "notSurvived"
}
}
}
}
```

# Start predicting labels
So we should have a working algorithm by now. We can start making predictions. Let's take the first sample of our Kaggle dataset.

```
curl -X POST \
http://localhost:8080/predictions \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{
"projectId": "kaggle-titanic",
"algorithmId": "tf-kaggle-titanic-1",
"features": {
"class": "CustomFeatures",
"data": [
1,
3,
"Braund Mr. Owen Harris",
"male",
22,
1,
0,
"A/5 21171",
"7.25",
"",
"S"
]
}
}
```

If our algorithm is correctly configured, we will get something like below.

```
{
"id": "5c93c052-be9e-4b1c-bfda-fbd3c0514966",
"projectId": "kaggle-titanic",
"algorithmId": "tf-kaggle-titanic-1",
"features": {
"data": [
1,
3,
"Braund Mr. Owen Harris",
"male",
22,
1,
0,
"A/5 21171",
"7.25",
"",
"S"
],
"class": "CustomFeatures"
},
"labels": {
"labels": [
{
"id": "4e1574f9-5376-434d-ada0-b74ed18ca50c",
"label": "survived",
"probability": 0,
"correctExampleUrl": "/examples?predictionId=5c93c052-be9e-4b1c-bfda-fbd3c0514966&labelId=4e1574f9-5376-434d-ada0-b74ed18ca50c&isCorrect=true",
"incorrectExampleUrl": "/examples?predictionId=5c93c052-be9e-4b1c-bfda-fbd3c0514966&labelId=4e1574f9-5376-434d-ada0-b74ed18ca50c&isIncorrect=true",
"class": "ClassificationLabel"
},
{
"id": "4f9e4773-a31d-4d49-a63e-360947060363",
"label": "notSurvived",
"probability": 1,
"correctExampleUrl": "/examples?predictionId=5c93c052-be9e-4b1c-bfda-fbd3c0514966&labelId=4f9e4773-a31d-4d49-a63e-360947060363&isCorrect=true",
"incorrectExampleUrl": "/examples?predictionId=5c93c052-be9e-4b1c-bfda-fbd3c0514966&labelId=4f9e4773-a31d-4d49-a63e-360947060363&isIncorrect=true",
"class": "ClassificationLabel"
}
]
},
"examples": []
}
```

Notice that we predicted the passenger would not survive with a probability of 1 ! Pay attention to the `correctExampleUrl` and `incorrectExampleUrl`. These links are relative to the root of the foundaml server.

They allow you to label a prediction correct or incorrect. If you can have humans validate your predictions, this is extremely valuable.

You should also be aware that each prediction and example gets published to your favorite streaming platform (Only Amazon Kinesis at the moment). This allows you to evaluate your algorithms in real time.