# Collecting data

In this part of the tutorial, we'll learn how to record our users' data.

Run the cell below to create a test application.

In [None]:
import os

from hemlock import User, Page, create_test_app
from hemlock.questions import Check, Input, Label
from hemlock.timer import Timer

os.environ.pop("GITPOD_HOST", None)

app = create_test_app()

We can get a user's data with the `get_data` method.

In [None]:
user = User.make_test_user()
user.get_data()

To record a user's response to a question, simply pass `variable="some_variable_name"`.

In the example below, we ask the user for their name and record it in a variable called `"name"`.

In [None]:
def seed():
    return [
        Page(
            Input(
                "What's your name?",
                variable="name"
            )
        ),
        Page(
            Label("Goodbye!")
        )
    ]


user = User.make_test_user(seed)
user.test_request(["Plato"])
user.get_data()

We can recode the values of `Check` and `Select` questions using tuples. The tuples are (choice value, choice label). The label is what the user sees. The value is what gets recorded.

In the example below, we ask the user how they're feeling today. The choices are labeled `"Bad"`, `"Neutral"`, and `"Good"`, with values `-1`, `0`, and `1` respectively.

During testing, the test response is the choice's *value*, not the choice's *label*. For example, if we want the user to say they're feeling good, the test response is `1`, not `"Good"`.

In [None]:
def seed():
    return [
        Page(
            Check(
                "How are you feeling today?",
                [(-1, "Bad"), (0, "Neutral"), (1, "Good")],
                variable="mood"
            )
        ),
        Page(
            Label("Goodbye!")
        )
    ]

user = User.make_test_user(seed)
user.test_request([1])
user.get_data()

The data from multiple choice questions are automatically one-hot encoded.

In [None]:
def seed():
    return [
        Page(
            Check(
                "Which composers do you like?",
                ["Mozart", "Chopin", "Beethoven"],
                multiple=True,
                variable="composer"
            )
        ),
        Page(
            Label("Goodbye!")
        )
    ]

user = User.make_test_user(seed)
user.test_request([{"Chopin", "Beethoven"}])
user.get_data()

We can also record data by passing `data=[("variable_name", value)]` to a page.

In the example below, we add data indicating that the user is on the red team.

In [None]:
def seed():
    return [
        Page(
            Label("Goodbye!"),
            data=[("team", "red")]
        )
    ]

user = User.make_test_user(seed)
user.test_get()
user.get_data()

If the value of the data is a `list`, each item of the list will appear on a different row of the dataframe.

For example, imagine we want to ask an employee about their experiences working with 2 teams: the red team and the blue team. We want to record the employee's experience with the red team on one row and their experience with the blue team on another.

In [None]:
def seed():
    return [
        Page(
            Label("Goodbye!"),
            data=[("teams", ["red", "blue"])]
        )
    ]

user = User.make_test_user(seed)
user.test_get()
user.get_data()

We often want the data from one question to appear on all rows of a user's dataframe. We can do this by passing `fill_rows=True` to a question.

In the example below, we ask the user for their name and give them 2 rows: 1 for the red team and 1 for the blue team. We want the user's name to appear in both rows.

In [None]:
def seed():
    return [
        Page(
            Input(
                "What's your name?",
                variable="name",
                fill_rows=True
            ),
            data=[("team", ["red", "blue"])]
        ),
        Page(
            Label("Goodbye!")
        )
    ]


user = User.make_test_user(seed)
user.test_request(["Plato"])
user.get_data()

Multiple questions with the same variable name will appear in different rows of the user's dataframe.

In the example below, we ask the user for their willingness to pay (WTP) for 3 items. We record the data for all 3 items using the variable name `"wtp"`. The resuling dataframe has 3 rows: 1 for each item.

In [None]:
def seed():
    return [
        Page(
            *[
                Input(
                    f"How much would you pay for item {item_number}?",
                    variable="wtp",
                    input_tag={"type": "number", "min": 0}
                )
                for item_number in range(3)
            ]
        ),
        Page(
            Label("Goodbye!")
        )
    ]


user = User.make_test_user(seed)
# willingness to pay is $10 for item 0, $11 for item 1, and $12 for item 2
user.test_request([10, 11, 12])
user.get_data()

We can record the number of seconds a user spends on a page by passing `timer="timer_variable_name"`.

In the example below, we record the time a user spends on page 0 in the variable `"seconds_spent_on_page0"`.

In [None]:
def seed():
    return [
        Page(
            Label("Hello, world!"),
            timer="seconds_spent_on_page0"
        ),
        Page(
            Label("Goodbye, world!")
        )
    ]

user = User.make_test_user(seed)
user.test_request()
user.get_data()

We can make the timer's data appear on all rows of the dataframe by passing `timer=Timer("timer_variable_name", fill_rows=True)`.

In [None]:
def seed():
    return [
        Page(
            Label("Hello, world!"),
            data=[("team", ["red", "blue"])],
            timer=Timer("seconds_spent_on_page0", fill_rows=True)
        ),
        Page(
            Label("Goodbye, world!")
        )
    ]

user = User.make_test_user(seed)
user.test_request()
user.get_data()

## Exercises

0. Create a survey with 2 pages:

    0. A page with 3 `Check` questions asking how the user was feeling 3 days ago, 2 days ago, and yesterday. Allow multiple responses. For example, a user could be both happy and optimistic.
    1. A goodbye page.
1. Record the data for all 3 questions on page 0 in a variable `"mood"`.
2. Add data to page 0 indicating which day is associated with which row. So, the dataframe should contain a new column (named something like `"days_ago"`) containing the values `"3 days ago"`, `"2 days ago"`, and `"yesterday"`.
3. Add a timer to record how many seconds the user spent on page 0. The data from this timer should appear on all rows of the dataframe.
4. Run a test user through your survey and display the resulting dataframe.
5. Replace the seed function in `src/my_survey.py` with the one you wrote in exercises 0-3. Run the application and go through the survey in your browser.
6. Download the data:

    0. If you're working in Gitpod, you can't do this in the simple browser. Instead, copy and paste the URL from the simple browser into a new tab in your real browser, then append "/admin-download" to the URL. If you're having trouble, rewatch the "Getting stared" video.
    1. If you're working on your local machine, go to <http://localhost:5000/admin-download>.
7. Test your code with `make test`

In [None]:
# WRITE YOUR CODE HERE

## Answers

In [None]:
def seed():
    days_ago = ["3 days ago", "2 days ago", "yesterday"]
    return [
        Page(
            *[
                Check(
                    f"How were you feeling {day}?",
                    ["Happy", "Sad", "Optimistic", "Pessimistic"],
                    multiple=True,
                    variable="mood"
                )
                for day in days_ago
            ],
            data=[("days_ago", days_ago)],
            timer=Timer("mood_seconds", fill_rows=True)
        ),
        Page(
            Label("Goodbye!")
        )
    ]


user = User.make_test_user(seed)
user.test_request([{"Happy"}, {"Sad"}, {"Happy", "Optimistic"}])
user.get_data()

See `src/data.py` for what this should look like in your survey file.

Now you know how to collect and download your users' data! Check out `060_compile.ipynb` for the next part of the tutorial.