**Contents**

1.  Object Count

2.  Summed Frequency

3.  Maps

**1. Object Count**

-   *Example query:*

    Show count of *‘Site’* per *‘Provenance’* per *‘Year’* in an amphora
    date range.

-   *Steps:*

    1.  Identify the list of (non unique for now) *‘Sites’* per
        *‘Amphora type’* per *‘Provenance’*.

        To that end, *‘Sites’* are grouped by *‘Amphora type’*,
        *‘Amphora lower date’*, *‘Amphora upper date’* and
        *‘Provenance’*. This is done with `pd.groupby()` function. To get
        a list of strings with *‘Site’* names separated by comma,
        `split()` method is used.

        *Example*: In Sonata dataset, for `Amphora type == AE 3` present
        in `Provenance == Egypt` with date range $-25.0 - 500.0$, the
        list of *‘Sites’* is `[ostia terme del nuotatore]`.

    2.  Define a dictionary with years (as keys), ranging from *‘Amphora
        lower date’* to *‘Amphora upper date’* (across the dataset for a
        given *‘Provenance’*), and zeros (as values).

        Further, for each year the site count will be added as a value.

        *Example:* In Sonata dataset, there are 7 *‘Amphora types’*
        present in *‘Egypt’*. Therefore, 7 amphora date ranges. As a
        result, a dictionary will contain years ranging from $-25$ to
        $499$,
      
            $$dict = \{-25:0,  -24: 0, \ldots , 499: 0\}.$$
        

    3.  Loop over years (in a dictionary) and amphora date ranges (in a
        dataframe).

        1.  Check whether a year falls within a date range of each
            *‘Amphora type’*,

            \begin{equation}
            \textit{Amphora lower date} <= \textbf{year} <= \textit{Amphora upper date.}
            \end{equation}

        2.  If the condition is `False` (it does not fall), then the
            looping process continues (goes to the next amphora date
            range).

        3.  If the condition is `True` (it falls), the list of *‘Sites’*
            for an *‘Amphora type’* is saved in a predefined list. Let’s
            call it *‘result’*.

    4.  Once the iteration over all amphoras’ date ranges for the first
        year in a dictionary is finished, *‘result’* contains from $0$
        to $N$ different lists of *‘Sites’*.

        For instance,

        \begin{equation}
        [['site', 'site'], ['site'], \ldots , ['site', 'site', 'site']].
        \end{equation}

        1.  If the length of result $== \textbf{0}$, then **NO**
            *‘Amphora types’* were found in a given year. One does not
            need to add $0$ to a dictionary since $0$ value is already
            assigned to each year (see step **2**).

        2.  If the length result $\geq \textbf{1}$, then unique *‘Site’*
            names are identified and counted. For the former, `set()`
            function is employed. For the latter, the length of the list
            with *‘Sites’* is calculated with `len()` method.

        3.  The resulting *‘Site’* count is added to the corresponding
            year in a dictionary as a value.

**2. Summed Frequency**

-   *Example query:*

    Show sum of *‘Frequency’* per *‘Provenance’* per *‘Year’*.

-   *Steps:*

    1.  Group *‘Amphora types’* by *‘Provenance’*.

        This is done with `pd.groupby()` function.

    2.  Sum *‘Frequencies’* across each *‘Amphora type’* found in a
        particular *‘Provenance’*.

        To that end, the `sum()` method is used.

        *Example:* In Sonata dataset, for `Amphora type == AE 3` present
        in `Provenance == Egypt` dated $-25.0 - 500.0$, the summed
        *‘Frequency’* is equal to 5.0.

    3.  Calculate *‘Frequency’* per *‘Year’* for each *‘Amphora type’*
        found in a certain *‘Provenance’*.

        The equation is written as,

        \begin{equation}
        \frac{\text{Amphora type summed frequency}}{\text{Amphora type date range}}.
        \end{equation}

        Here an amphora date range is defined as,

        \begin{equation}
        \textit{Amphora upper date} - \textit{Amphora lower date}.
        \end{equation}

        *Example:* The *‘Frequency’* per *‘Year’* for *‘AE 3’* present in
        *‘Egypt'* is calculated as,

        \begin{equation}
        \frac{0.5}{500-(-25)} = \frac{0.5}{525} = 0.0095238095.
        \end{equation}

    4.  Define a dictionary with years (as keys), ranging from *‘Amphora
        lower date’* to *‘Amphora upper date’* (across the dataset for a
        given *‘Provenance’*), and zeros (as values).

        Further, for each year *‘Amphora type’* summed *‘Frequency’* per
        *‘Year’* will be added as a value.

        *Example:* There are 7 *‘Amphora types’* present in Egypt.
        Therefore, 7 amphora date ranges. As a result, the dictionary
        will contain years ranging from $-25$ to $499$,

        \begin{equation}
        dict = \{-25:0,  -24: 0, \ldots , 499: 0\}.
        \end{equation}

    5.  Iterate over amphora types’ date ranges (related to a given
        *‘Provenance’*).

        1.  Assign *‘Amphora type’* summed *‘Frequency’* per *‘Year’*
            value to the corresponding year in a dictionary. If several
            *‘Amphora types’* fall within the same year, their summed
            *‘Frequency’* values per *‘Year’* will be added up.

            *Example:* In case of Egypt, year $-$ 25 falls within 4 amphora types’ 
            date ranges (AE 3, Dr 2-4/Pompeii 5, Egyptian, bi-troncoconica). 
            The summation of their frequency values gives 0.03628371628371629.


**3. Maps**

-   *Example queries:*

    Plot as dots on a spatial map all *‘Site’* locations which have a
    certain *‘Provenance’* for a given period of time.

    Show the summed *‘Frequency’* as the size/colour of the dot.

    Determine for each *‘Year’* the *‘Sites’* on which there is evidence
    of an *‘Amphora type’* with a certain *‘Provenance’*.

-   *Steps:*

    1.  Identify *‘Sites’* belonging to a *‘Provenance’* of interest.

        This is done by limining the dataframe to rows and columns
        containing information related to a *‘Provenance’* in question.

        *Example:* In Sonata dataset, the restriction to `Provenance ==
        Africa` can be performed with `df_africa = df[df[Provenance]
        == africa]`.

    2.  Calculate *‘Frequency’* per *‘Year’* per *‘Amphora type’* (see
        Sec.**2** step **3**).

    3.  Calculate the proportion of *‘Frequency’* (computed earlier)
        to a given map period.

        1.  Compare a map date range with an amphora type date range and
            identify the length of intersection, i.e.,

            \begin{equation}
            \text{the size of (Map date range} \cap \text{Amphora type date range)}. 
            \end{equation}

            The resulting value indicates the number of years (for each
            amphora) that fall within a map date range.

        2.  Multiply the *‘Frequency’* per *‘Year’* value (calculated in
            step **2**) by the obtained length of intersection.

            The value $=$ 0 shows that an amphora type date range does
            not fall within a map time period. Thus, such *‘Amphoras’*
            were not found during a given timeframe. The value $>$ 0
            indicates that an amphora type date range falls either fully
            or partially within a map date range.

        3.  Restrict the dataframe to rows with proportion $>$ 0.

    4.  Calculate summed *‘Frequency’* (of *‘Amphora types’*) per
        *‘Site’*.

        1.  Group proportion values obtained in step **3** by *‘Site’*,
            *‘Latitude’* and *‘Longitude’* coordinates with `pd.groupby()`
            function.

        2.  Sum grouped proportion values across *‘Sites’* with `sum()`
            method.

    5.  Count the number of unique *‘Amphora types’* per *‘Site’*.

        1.  Group *‘Amphora types’* by *‘Site’* with `pd.groupby()`
            function.

        2.  Count the unique *‘Sites’* with `unique()` method.