# Getting Started
Before we can build a model we need data. We'll use the [Stack Exchange API](https://api.stackexchange.com/) and the Python package [`StackAPI`](https://stackapi.readthedocs.io/en/latest/).

> NOTE: Before installing I suggest creating a new environment so as not to mess up any of your other dependencies. If you don't know how to do this you can see my blog, [_We Recommend Creating an Environment_](https://medium.com/@ianiat11/we-recommend-creating-an-environment-da38af0cecbb).

# The API
The `StackAPI` package can be used to query the Stack Exchange API -- meaning any of the subdomains of [Stack Exchange](https://stackexchange.com/). I'm only interested in [Stack Overflow](https://stackoverflow.com/), so I'm going to write a small package that limits the usage of the `StackAPI` to that site.

We'll start by creating a new Python package called `is_question`. This will eventually hold all of our source code for our project. To make the empty package make a directory called "is_question" and put an "\_\_init\_\_.py" file inside of it. Boom! Now you have an `is_question` Python package. &#128526; Inside the package add another file called "api.py". This is where we'll define our own `StackOverflow` class that only makes API calls to the Stack Overflow site. Now add the following to "api.py":

In [13]:
"""contents of api.py"""
from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

We have now subclassed the `StackAPI` class. Next we limit the site.

>NOTE: If you're unfamiliar with **O**bject-**O**riented **P**rogramming (**OOP**) I suggest watching [this tutorial series](https://www.youtube.com/watch?v=ZDa-Z5JzLYM&list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc&pp=iAQB) from [Corey Shafer](https://www.youtube.com/@coreyms/featured), or any other OOP tutorial you prefer.

In [14]:
"""contents of api.py"""
from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, **kwargs):
        super().__init__(name="stackoverflow", **kwargs)

Within the `__init__` method we hard-coded the `name` argument as `"stackoverflow"`. If we were to use the `StackAPI` class as-is we'd do this:

In [15]:
so = StackAPI(name="stackoverflow")

# fetch some questions
response = so.fetch("questions")
print(response)



But because we've hard-coded the name, we don't have to specify it during instantiation:

In [16]:
so = StackOverflow()

# fetch some questions
response = so.fetch("questions")
print(response)



# Quotas
Before we get too ahead of ourselves, let's take note of a couple important fields in the `response` variable: `"quota_max"` and `"quota_remaining"`. These fields tell us how many calls we can make to the API and how many we have left, respectively. Once we run out we'll have to wait some time (typically 24 hours) for them to be replenished. We can increase both values by registering for an API key on [stackapps.com](https://stackapps.com/apps/oauth/register). Once you've done this, you'll want to copy the `key` and save it somewhere. You can ignore the `client_secret` for now as we'll not be needing it.

> NOTE: if you have already registered for an API key and want to use the same one you can. Or if you've lost your API key you can get it back by visiting [stackapps.com/apps/oauth](https://stackapps.com/apps/oauth).

Using the API key is simple -- we supply it as an argument to the `StackAPI` class:

In [20]:
# I would supply my key like this
# so = StackOverflow(key="my_key")

While this works, it means we have to have our key on hand whenever we're initializing our `StackOverflow` object. An easier, more automatic way of doing this would be to store the `key` somewhere global, like our environment variables, and set the key within our class during the `__init__` method.

If using Windows, you can store the API key as an environment variable by doing the following:
1. Click "Start" (or the Windows key) and search "environment".
2. Click on "Edit environment variables for your account". Do NOT click on "Edit the system environment variables".
3. Click on "New" under the first scroll box.
4. Type a name for the environment variable in the Variable name box (I used "stackapi_key").
5. Paste your KEY value in the Variable value box.
6. Click OK.
7. Click OK again.

I saved my key to a variable called `"stackapi_key"`, and I can access it using `os.getenv`:

In [21]:
from os import getenv

key = getenv("stackapi_key")

print(key is None)  # this will return False if my key has been stored correctly

False


> NOTE: if the above code doesn't return a value for you, try restarting your Python session.

Adding the key to our `StackOverflow.__init__` method is straightforward.

In [22]:
"""contents of api.py"""
from os import getenv

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, **kwargs):
        key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)

We can run our same code as before, but we should see that our `quota_max` and `quota_remaining` have increased.

In [23]:
so = StackOverflow()

# fetch some questions
response = so.fetch("questions")
print({k: v for k, v in response.items() if k.startswith("quota")})

{'quota_max': 10000, 'quota_remaining': 9914}


That should last a lot longer.

Now what if I have multiple API keys? How can I use one that isn't saved as an environment variable? If you try to set a `key` argument during instantiation like we did previously you'll raise a `TypeError`:

In [24]:
so = StackOverflow(key="my_key")

TypeError: stackapi.stackapi.StackAPI.__init__() got multiple values for keyword argument 'key'

To get around this we can add `key` as an optional argument. If it's `None`, we'll use our API key stored in the environment variables; otherwise we'll use the value supplied.

In [25]:
"""contents of api.py"""
from os import getenv

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, key: str | None = None, **kwargs):
        if key is None:
            key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)

Now we can supply a different `key` to our class.

In [26]:
# I would supply my key like this
# so = StackOverflow(key="my_key")

# Tracking quotas
We've found a way to increase our `quota_max`, but how do can we track our `quota_remaining` without checking our `response` variable? By adding an attribute to our class.

By default, we only see the `quota_remaining` value after we call `fetch`. We can modify our `fetch` method to store the `quota_remaining` to the object just before it returns the `response` variable.

In [27]:
"""contents of api.py"""
from os import getenv
from typing import Any

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, key: str | None = None, **kwargs):
        if key is None:
            key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)
        self.quota_remaining: int | None = None

    def fetch(self, endpoint=None, page=1, key=None, filter='default', **kwargs) -> dict[str, Any]:
        response = super().fetch(endpoint=endpoint, page=page, key=key, filter=filter, **kwargs)
        self.quota_remaining = response.get("quota_remaining")
        return response

Now when we make an API call we'll be able to see our `quota_remaining` without having to store the `response.

In [28]:
so = StackOverflow()

response = so.fetch("questions")
print(so.quota_remaining)

9994


This works, but isn't as fool-proof as I'd like. If for some reason we accidentally overwrote the `quota_remaining` attribute, we'd be none the wiser.

In [29]:
so.quota_remaining = 10_000
print(so.quota_remaining)

10000


To avoid this we'll convert `quota_remaining` into a property and make the actual variable a private attribute.

In [30]:
"""contents of api.py"""
from os import getenv
from typing import Any

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, key: str | None = None, **kwargs):
        if key is None:
            key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)
        self._quota_remaining: int | None = None

    def fetch(self, endpoint=None, page=1, key=None, filter='default', **kwargs) -> dict[str, Any]:
        response = super().fetch(endpoint=endpoint, page=page, key=key, filter=filter, **kwargs)
        self._quota_remaining = response.get("quota_remaining")
        return response

    @property
    def quota_remaining(self) -> int | None:
        return self._quota_remaining

Now we'll raise an `AttributeError` if we try to overwrite the property.

In [32]:
so = StackOverflow()

response = so.fetch("questions")
print(so.quota_remaining)

so.quota_remaining = 10_000
print(so.quota_remaining)

9988


AttributeError: property 'quota_remaining' of 'StackOverflow' object has no setter

# Questions
This next step is optional. We already have a functioning class that can use our registered API key, track our remaining quotas, and make API calls to Stack Overflow. But because this project is solely focussed on "questions", I'm going to create a method that only fetches questions -- `fetch_questions`.

The method will work in two ways:
1. It will make a call to the endpoint `"questions"` (what we've already seen)
2. It will make a call to the endpoint `"questions/{ids}"` (will return questions limited by upto 100 `question_id` values.)

We'll start with the first way:

In [33]:
"""contents of api.py"""
from os import getenv
from typing import Any

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, key: str | None = None, **kwargs):
        if key is None:
            key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)
        self._quota_remaining: int | None = None

    def fetch(
            self, endpoint=None, page=1, key=None, filter='default', **kwargs
    ) -> dict[str, Any]:
        response = super().fetch(
            endpoint=endpoint, page=page, key=key, filter=filter, **kwargs
        )
        self._quota_remaining = response.get("quota_remaining")
        return response

    @property
    def quota_remaining(self) -> int | None:
        return self._quota_remaining

    def fetch_questions(self, **kwargs) -> dict[str, Any]:
        """Fetch questions from Stack Overflow."""
        endpoint = "questions"
        response = self.fetch(endpoint=endpoint, **kwargs)
        return response

Now we don't have to provide the argument `"questions"` to our `fetch` method anymore.

In [34]:
so = StackOverflow()

response = so.fetch_questions()
print(response)



What is this "other way" I'm referring to? Let's inspect the `response` variable for an example:

In [35]:
items = response.get("items")
print(items[0])

{'tags': ['android', 'kotlin', 'animation', 'android-recyclerview'], 'owner': {'reputation': 21, 'user_id': 19251805, 'user_type': 'registered', 'profile_image': 'https://lh3.googleusercontent.com/a/AATXAJzZxMUtmdcs1dwvGP4oADPdUb28_KgO-22OA8IY=k-s256', 'display_name': 'Dema Dima', 'link': 'https://stackoverflow.com/users/19251805/dema-dima'}, 'is_answered': False, 'view_count': 1, 'answer_count': 0, 'score': 0, 'last_activity_date': 1688323600, 'creation_date': 1688323600, 'question_id': 76600335, 'content_license': 'CC BY-SA 4.0', 'link': 'https://stackoverflow.com/questions/76600335/animation-bug-in-recyclerview-when-fast-scrolling', 'title': 'Animation bug in RecyclerView when fast scrolling'}


In the `response` there is a key called `items` that holds all the questions returned by our `fetch`. Within any one of these `items` is a dictionary with a key called `"question_id"`. This is a unique identifier for the question. We can search for a specific question by supplying this `question_id` to the endpoint `"questions/{ids}"` like so:

In [36]:
question_id = items[0].get("question_id")
single_question_response = so.fetch("questions/{ids}", ids=[question_id])
print(single_question_response)

{'backoff': 0, 'has_more': False, 'page': 1, 'quota_max': 10000, 'quota_remaining': 9981, 'total': 0, 'items': [{'tags': ['android', 'kotlin', 'animation', 'android-recyclerview'], 'owner': {'reputation': 21, 'user_id': 19251805, 'user_type': 'registered', 'profile_image': 'https://lh3.googleusercontent.com/a/AATXAJzZxMUtmdcs1dwvGP4oADPdUb28_KgO-22OA8IY=k-s256', 'display_name': 'Dema Dima', 'link': 'https://stackoverflow.com/users/19251805/dema-dima'}, 'is_answered': False, 'view_count': 4, 'answer_count': 0, 'score': 0, 'last_activity_date': 1688323600, 'creation_date': 1688323600, 'question_id': 76600335, 'content_license': 'CC BY-SA 4.0', 'link': 'https://stackoverflow.com/questions/76600335/animation-bug-in-recyclerview-when-fast-scrolling', 'title': 'Animation bug in RecyclerView when fast scrolling'}]}


The endpoint `"questions/{ids}"` can take upto 100 `ids`. We can make our `fetch_questions` method a bit more flexible by adding a `question_ids` argument and updating the `endpoint` dynamically.

In [38]:
"""contents of api.py"""
from os import getenv
from typing import Any

from stackapi import StackAPI


class StackOverflow(StackAPI):
    """A subclass of `StackAPI` that limits API calls to StackOverflow."""

    def __init__(self, key: str | None = None, **kwargs):
        if key is None:
            key = getenv("stackapi_key")
        super().__init__(name="stackoverflow", key=key, **kwargs)
        self._quota_remaining: int | None = None

    def fetch(
            self, endpoint=None, page=1, key=None, filter='default', **kwargs
    ) -> dict[str, Any]:
        response = super().fetch(
            endpoint=endpoint, page=page, key=key, filter=filter, **kwargs
        )
        self._quota_remaining = response.get("quota_remaining")
        return response

    @property
    def quota_remaining(self) -> int | None:
        return self._quota_remaining

    def fetch_questions(
            self, question_ids: list[int] | None = None, **kwargs
    ) -> dict[str, Any]:
        """Fetch questions from Stack Overflow."""
        endpoint = "questions"
        if question_ids is None:
            response = self.fetch(endpoint=endpoint, **kwargs)
        else:
            endpoint += "/{ids}"
            response = self.fetch(endpoint=endpoint, ids=question_ids, **kwargs)
        return response

In [39]:
so = StackOverflow()
# NOTE -- question_id was defined in the previous code block
single_question_response = so.fetch_questions(question_ids=[question_id])
print(single_question_response)

{'backoff': 0, 'has_more': False, 'page': 1, 'quota_max': 10000, 'quota_remaining': 9979, 'total': 0, 'items': [{'tags': ['android', 'kotlin', 'animation', 'android-recyclerview'], 'owner': {'reputation': 21, 'user_id': 19251805, 'user_type': 'registered', 'profile_image': 'https://lh3.googleusercontent.com/a/AATXAJzZxMUtmdcs1dwvGP4oADPdUb28_KgO-22OA8IY=k-s256', 'display_name': 'Dema Dima', 'link': 'https://stackoverflow.com/users/19251805/dema-dima'}, 'is_answered': False, 'view_count': 8, 'answer_count': 0, 'score': 0, 'last_activity_date': 1688323935, 'creation_date': 1688323600, 'last_edit_date': 1688323935, 'question_id': 76600335, 'content_license': 'CC BY-SA 4.0', 'link': 'https://stackoverflow.com/questions/76600335/animation-bug-in-recyclerview-when-fast-scrolling', 'title': 'Animation bug in RecyclerView when fast scrolling'}]}


We now have a convenient way to get Stack Overflow questions. In the next part we'll work on processing them.