# Get various data from LMFDB using `lmfdb_lite`

In this notebook, we learn how to query LMFDB using [`lmfdb_lite`](https://github.com/roed314/lmfdb-lite). In particular, we will query data for Artin representations and abelian varieties over finite fields.

In [None]:
import pathlib

import polars as pl
from tqdm import tqdm

from lmf import db

## 1. Artin representations

There are two database related to Artin representations: `artin_reps` and `artin_field_data`. The "schema" for each table can be found under the `Underlying data` of any Artin representation page (e.g. see [here](https://www.lmfdb.org/ArtinRepresentation/data/2.163.8t12.a)).

In [None]:
artin_db = db.artin_reps
artin_field_db = db.artin_field_data

Q1. Query and save the 3-dimensional Artin representations. The columns will be `Baselabel, Dim, Conductor, Is_Even, Indicator`.

1. What is the total number of representations?

2. What is the smallest possible conductor? (See [completeness](https://www.lmfdb.org/ArtinRepresentation/Completeness).)

3. Among them, find the number of even/odd representations.

4. Count the number of representations for each possible Frobenius-Schur indicators.


In [None]:
def query_artin(d, limit=None):
    cols = "Your columns here"
    search_data = artin_db.search({"Dim": d}, cols, limit=limit)
    result = []
    for r in tqdm(search_data):
        # Your code here
        label = None
        dimension = None
        conductor = None
        is_even = None
        fs_indicator = None

        result.append(
            {
                "Baselabel": label,
                "Dim": dimension,
                "Conductor": conductor,
                "Is_Even": is_even,
                "Indicator": fs_indicator,
            }
        )
    schema = [
        ("Baselabel", pl.String),
        ("Dim", pl.Int8),
        ("Conductor", pl.Int128),
        ("Is_Even", pl.Int8),
        ("Indicator", pl.Int8),
    ]
    return pl.DataFrame(result, schema=schema)

In [None]:
# Your code here

Q2. Using a *single* query, find the number of Artin representations such that

- dimension is 4
- even
- conductor $\le 10^6$

[This](https://github.com/roed314/psycodict/blob/master/QueryLanguage.md) page might be helpful.


In [None]:
# Your query here
query = None

result = list(artin_db.search(query))  # Note that the result is a generator, so we convert it to a list
print(len(result))

Q3. Using both of `artin_reps` and `artin_field_data`, compute the Frobenius traces $a_p$ of Artin representations, and add these as columns `a_p` for $p \le 1000$. Only consider the representations *with single Galois orbit*. The following columns will be useful:

- `Frobs` of `artin_field_data`. This is a list of integers. If $i$-th element ($i \ge 1$) is $j$, it means that the Frobenius at $p_i$ ($i$-th prime) lies at the $j$-th conjugacy class.

- `Character` of `artin_field_data`. This is a list of list of integers, which contains values of characters for each conjugacy class.

In [None]:
def query_artin_by_label(labels):
    """Query by Baselabels."""
    query = {"Baselabel": {'$in': labels}}
    cols = ['BadPrimes', 'Baselabel', 'Conductor', 'Dim', 'Indicator', 'Is_Even']
    result = artin_db.search(query, cols)
    return result

def query_artin_field_db(d, limit=None):
    cols = ['ArtinReps', 'ComplexConjugation', 'ConjClasses', 'Frobs']
    search_data = artin_field_db.search({}, limit=limit)
    result = []
    for r in tqdm(search_data):
        for rep in r['ArtinReps']:
            # Your code here
            label = None
            rep_d = None
            rep_cond = None
            
            # Filter by dimension
            # Your code here

            # Only single galois orbit
            # Your code here
            char_dict = None

            p_to_frob = []
            frob_idx = list(r['Frobs'])
            ps = primes_first_n(len(frob_idx))
            for i, p in enumerate(ps):
                if rep_cond % p != 0:  # unramified
                    p_to_frob.append(char_dict[frob_idx[i]])
                else:  # ramified
                    p_to_frob.append(0)

            result.append(
                {
                    'Baselabel': label,
                    'Dim': rep_d,
                    'Conductor': rep_cond,
                    'Character': char_dict,
                    'P_to_Frob': p_to_frob,
                }
            )
    return result

In [None]:
data_from_field_db = query_artin_field_db(d=3, limit=None)

schema = [
    ('Baselabel', pl.String),
    ('Dim', pl.Int8),
    ('Conductor', pl.Int128),
    ('Is_Even', pl.Int8),
    ('Indicator', pl.Int8)
] + [(f'a_{p:03d}', pl.Int8) for p in primes_first_n(168)]
artin_data = []
chunk_size = 1000
N = len(data_from_field_db)

# Process in chunks to avoid large queries
for i in tqdm(range(0, N, chunk_size)):
    labels_chunk = [r['Baselabel'] for r in data_from_field_db[i:i+chunk_size]]
    artin_reps_chunk = query_artin_by_label(labels_chunk)
    artin_rep_dict = {r['Baselabel']: r for r in artin_reps_chunk}
    
    for r in data_from_field_db[i:i+chunk_size]:
        r_data = artin_rep_dict[r['Baselabel']]
        row = [r['Baselabel'], r['Dim'], r['Conductor'], int(r_data['Is_Even']), r_data['Indicator']] + r['P_to_Frob']
        artin_data.append(row)

df = pl.DataFrame(artin_data, schema=schema)
df.write_csv(pathlib.Path.cwd() / "artin_reps_d3.csv")

## 2. Abelian varieties over finite fields

LMFDB also has a database for isogeny classes of abelian varities over finite fields.

In [None]:
av_db = db.av_fq_isog

Q4. Query and save the abelian varities over $\mathbb{F}_5$ of dimension 2. The columns will be `label, a0, a1, a2, a3, a4, p_rank, is_simple, has_principal_polarization, has_jacobian` where `a0`-`a4` are coefficients of $L$-polynomials $L_A(t) = a_0 + a_1 t + a_2 t^2 + a_3 t^3 + a_4 t^4$.

1. What is the total number of such abelian varieties?

2. What is the number of simple abelian varieties?

3. Compute the number of abelian varieties for each possible $p$-ranks, where

$$
\mathrm{rank}_p(A) := \dim_{\mathbb{F}_p}(A(\overline{\mathbb{F}_p})[p])
$$

4. How many of them are principally polarizable? How many of them contains a Jacobian of a curve?

5. Find largest and smallest possible value of $a_3$.

In [None]:
def query_av(q, g, limit=None):
    cols = ["label", "poly", "p_rank", "is_simple", "has_principal_polarization", "has_jacobian"]
    search_data = av_db.search({"q": q, "g": g}, cols, limit=limit)
    result = []
    for r in tqdm(search_data):
        # Your code here
        label = None
        a0, a1, a2, a3, a4 = None, None, None, None, None
        p_rank = None
        is_simple = None
        has_principal_polarization = None
        has_jacobian = None

        result.append(
            {
                "label": label,
                "a0": a0,
                "a1": a1,
                "a2": a2,
                "a3": a3,
                "a4": a4,
                "p_rank": p_rank,
                "is_simple": is_simple,
                "has_principal_polarization": has_principal_polarization,
                "has_jacobian": has_jacobian,
            }
        )
    schema = [
        ("label", pl.String),
        ("a0", pl.Int64),
        ("a1", pl.Int64),
        ("a2", pl.Int64),
        ("a3", pl.Int64),
        ("a4", pl.Int64),
        ("p_rank", pl.Int8),
        ("is_simple", pl.Int8),
        ("has_principal_polarization", pl.Int8),
        ("has_jacobian", pl.Int8),
    ]
    return pl.DataFrame(result, schema=schema)

In [None]:
# Your code here

Bonus questions.

- Build a machine learning model that predicts parity of Artin representations from Frobenius traces.
- For a fixed $p$, build a machine learning model that predicts $a_{p}$ of Artin representations from all $a_{p'}$ with $p' < p$.
- Build a machine learning model that predicts $p$-rank from $L$-polynomials.
- Build a machine learning model that predicts polarizability from $L$-polynomials.
- Try to query other data. [This](https://www.lmfdb.org/api/) and [this](https://github.com/roed314/psycodict/blob/master/QueryLanguage.md) pages might be helpful.