# Get various data from LMFDB using `lmfdb_lite`

In this notebook, we learn how to query LMFDB using [`lmfdb_lite`](https://github.com/roed314/lmfdb-lite). In particular, we will query data for Artin representations and abelian varieties over finite fields.

In [1]:
import pathlib

import polars as pl
from tqdm import tqdm

from lmf import db

## 1. Artin representations

There are two database related to Artin representations: `artin_reps` and `artin_field_data`. The "schema" for each table can be found under the `Underlying data` of any Artin representation page (e.g. see [here](https://www.lmfdb.org/ArtinRepresentation/data/2.163.8t12.a)).

In [2]:
artin_db = db.artin_reps
artin_field_db = db.artin_field_data

Q1. Query and save the 3-dimensional Artin representations. The columns will be `Baselabel, Dim, Conductor, Is_Even, Indicator`.

1. What is the total number of representations?

2. What is the smallest possible conductor? (See [completeness](https://www.lmfdb.org/ArtinRepresentation/Completeness).)

3. Among them, find the number of even/odd representations.

4. Count the number of representations for each possible Frobenius-Schur indicators.


In [3]:
def query_artin(d, limit=None):
    cols = ["Baselabel", "Dim", "Conductor", "Is_Even", "Indicator"]
    search_data = artin_db.search({"Dim": d}, cols, limit=limit)
    result = []
    for r in tqdm(search_data):
        label = r["Baselabel"]
        dimension = r["Dim"]
        conductor = r["Conductor"]
        is_even = r["Is_Even"]
        fs_indicator = r["Indicator"]
        result.append(
            {
                "Baselabel": label,
                "Dim": dimension,
                "Conductor": conductor,
                "Is_Even": is_even,
                "Indicator": fs_indicator,
            }
        )
    schema = [
        ("Baselabel", pl.String),
        ("Dim", pl.Int8),
        ("Conductor", pl.Int128),
        ("Is_Even", pl.Int8),
        ("Indicator", pl.Int8),
    ]
    return pl.DataFrame(result, schema=schema)

In [4]:
df = query_artin(3, limit=None)
print(df)
print(f"Total number of representations: {len(df)}")
print(f"Smallest possible conductor: {df['Conductor'].min()}")
print(df.group_by("Is_Even").count())
print(df.group_by("Indicator").count())

39913it [00:04, 9289.85it/s] 

shape: (39_913, 5)
┌─────────────────────────────────┬─────┬───────────────────────┬─────────┬───────────┐
│ Baselabel                       ┆ Dim ┆ Conductor             ┆ Is_Even ┆ Indicator │
│ ---                             ┆ --- ┆ ---                   ┆ ---     ┆ ---       │
│ str                             ┆ i8  ┆ i128                  ┆ i8      ┆ i8        │
╞═════════════════════════════════╪═════╪═══════════════════════╪═════════╪═══════════╡
│ 3.229.4t5.a                     ┆ 3   ┆ 229                   ┆ 1       ┆ 1         │
│ 3.229.4t5.b                     ┆ 3   ┆ 229                   ┆ 1       ┆ 1         │
│ 3.257.4t5.b                     ┆ 3   ┆ 257                   ┆ 1       ┆ 1         │
│ 3.257.4t5.a                     ┆ 3   ┆ 257                   ┆ 1       ┆ 1         │
│ 3.283.4t5.a                     ┆ 3   ┆ 283                   ┆ 0       ┆ 1         │
│ …                               ┆ …   ┆ …                     ┆ …       ┆ …         │
│ 3.473490924




Q2. Using a *single* query, find the number of Artin representations such that

- dimension is 4
- even
- conductor $\le 10^6$

[This](https://github.com/roed314/psycodict/blob/master/QueryLanguage.md) page might be helpful.


In [5]:
query = {
    "Dim": 4,
    "Is_Even": True,
    "Conductor": {'$lte': 10^6}
}
result = list(artin_db.search(query))
print(len(result))

7824


Q3. Using both of `artin_reps` and `artin_field_data`, compute the Frobenius traces $a_p$ of Artin representations, and add these as columns `a_p` for $p \le 1000$. Only consider the representations *with single Galois orbit*. The following columns will be useful:

- `Frobs` of `artin_field_data`. This is a list of integers. If $i$-th element ($i \ge 1$) is $j$, it means that the Frobenius at $p_i$ ($i$-th prime) lies at the $j$-th conjugacy class.

- `Character` of `artin_field_data`. This is a list of list of integers, which contains values of characters for each conjugacy class.

In [6]:
def query_artin_by_label(labels):
    """Query by Baselabels."""
    query = {"Baselabel": {'$in': labels}}
    cols = ['BadPrimes', 'Baselabel', 'Conductor', 'Dim', 'Indicator', 'Is_Even', 'GalConjSigns']
    result = artin_db.search(query, cols)
    return result

def query_artin_field_db(d=3, limit=None):
    cols = ['ArtinReps', 'ComplexConjugation', 'ConjClasses', 'Frobs', 'Polynomial']
    search_data = artin_field_db.search({}, limit=limit)
    result = []
    labels = set()
    for r in tqdm(search_data):
        for rep in r['ArtinReps']:

            label = rep['Baselabel']
            if label in labels:  # Avoid duplicates
                continue
            labels.add(label)

            rep_d = int(label.split('.')[0])
            rep_cond = int(label.split('.')[1])
            if d is not None and rep_d != d:
                continue
            
            # Only single galois orbit
            if rep['CharacterField'] != 1:
                continue
            char_dict = {i+1: c[0] for i, c in enumerate(rep['Character'])}

            p_to_frob = []
            frob_idx = list(r['Frobs'])
            ps = primes_first_n(len(frob_idx))
            for i, p in enumerate(ps):
                p_to_frob.append(char_dict[frob_idx[i]])

            result.append(
                {
                    'Baselabel': label,
                    'Dim': rep_d,
                    'Conductor': rep_cond,
                    'Character': char_dict,
                    'P_to_Frob': p_to_frob,
                }
            )
    return result

In [8]:
d = 3
data_from_field_db = query_artin_field_db(d=d, limit=None)

schema = [
    ('Baselabel', pl.String),
    ('Dim', pl.Int8),
    ('Conductor', pl.Int128),
    ('Is_Even', pl.Int8),
    ('Indicator', pl.Int8),
    ('Rootnumber', pl.Int8),
] + [(f'a_{p:03d}', pl.Int8) for p in primes_first_n(168)]
artin_data = []
chunk_size = 1000
N = len(data_from_field_db)
print(f"Total number of Artin representations from field db: {N}")

# Process in chunks to avoid large queries
for i in tqdm(range(0, N, chunk_size)):
    labels_chunk = [r['Baselabel'] for r in data_from_field_db[i:i+chunk_size]]
    artin_reps_chunk = query_artin_by_label(labels_chunk)
    artin_rep_dict = {r['Baselabel']: r for r in artin_reps_chunk}
    
    for r in data_from_field_db[i:i+chunk_size]:
        r_data = artin_rep_dict[r['Baselabel']]
        row = [r['Baselabel'], r['Dim'], r['Conductor'], int(r_data['Is_Even']), r_data['Indicator'], r_data['GalConjSigns'][0]] + r['P_to_Frob']
        artin_data.append(row)

df = pl.DataFrame(artin_data, schema=schema)
df = df.sort('Conductor')
print(df)
df.write_csv(pathlib.Path.cwd() / f"artin_reps_d{d}.csv")

606677it [06:53, 1468.63it/s]


Total number of Artin representations from field db: 30324


100%|██████████| 31/31 [00:32<00:00,  1.04s/it]


shape: (30_324, 174)
┌────────────────────────┬─────┬─────────────────────┬─────────┬───┬───────┬───────┬───────┬───────┐
│ Baselabel              ┆ Dim ┆ Conductor           ┆ Is_Even ┆ … ┆ a_977 ┆ a_983 ┆ a_991 ┆ a_997 │
│ ---                    ┆ --- ┆ ---                 ┆ ---     ┆   ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ str                    ┆ i8  ┆ i128                ┆ i8      ┆   ┆ i8    ┆ i8    ┆ i8    ┆ i8    │
╞════════════════════════╪═════╪═════════════════════╪═════════╪═══╪═══════╪═══════╪═══════╪═══════╡
│ 3.229.4t5.a            ┆ 3   ┆ 229                 ┆ 1       ┆ … ┆ -1    ┆ 1     ┆ 0     ┆ 0     │
│ 3.229.4t5.b            ┆ 3   ┆ 229                 ┆ 1       ┆ … ┆ -1    ┆ 1     ┆ 0     ┆ 0     │
│ 3.257.4t5.a            ┆ 3   ┆ 257                 ┆ 1       ┆ … ┆ -1    ┆ 1     ┆ 1     ┆ -1    │
│ 3.257.4t5.b            ┆ 3   ┆ 257                 ┆ 1       ┆ … ┆ -1    ┆ 1     ┆ 1     ┆ -1    │
│ 3.283.4t5.b            ┆ 3   ┆ 283                 ┆ 0       ┆ … ┆ -

## 2. Abelian varieties over finite fields

LMFDB also has a database for isogeny classes of abelian varities over finite fields.

In [9]:
av_db = db.av_fq_isog

Q4. Query and save the abelian varities over $\mathbb{F}_5$ of dimension 2. The columns will be `label, a0, a1, a2, a3, a4, p_rank, is_simple, has_principal_polarization, has_jacobian` where `a0`-`a4` are coefficients of $L$-polynomials $L_A(t) = a_0 + a_1 t + a_2 t^2 + a_3 t^3 + a_4 t^4$.

1. What is the total number of such abelian varieties?

2. What is the number of simple abelian varieties?

3. Compute the number of abelian varieties for each possible $p$-ranks, where

$$
\mathrm{rank}_p(A) := \dim_{\mathbb{F}_p}(A(\overline{\mathbb{F}_p})[p])
$$

4. How many of them are principally polarizable? How many of them contains a Jacobian of a curve?

5. Find largest and smallest possible value of $a_3$.

In [10]:
def query_av(q, g, limit=None):
    cols = ["label", "poly", "p_rank", "is_simple", "has_principal_polarization", "has_jacobian"]
    search_data = av_db.search({"q": q, "g": g}, cols, limit=limit)
    result = []
    for r in tqdm(search_data):
        label = r["label"]
        poly = r["poly"]
        a0, a1, a2, a3, a4 = poly
        p_rank = r["p_rank"]
        is_simple = r["is_simple"]
        has_principal_polarization = r["has_principal_polarization"]
        has_jacobian = r["has_jacobian"]
        result.append(
            {
                "label": label,
                "a0": a0,
                "a1": a1,
                "a2": a2,
                "a3": a3,
                "a4": a4,
                "p_rank": p_rank,
                "is_simple": is_simple,
                "has_principal_polarization": has_principal_polarization,
                "has_jacobian": has_jacobian,
            }
        )
    schema = [
        ("label", pl.String),
        ("a0", pl.Int64),
        ("a1", pl.Int64),
        ("a2", pl.Int64),
        ("a3", pl.Int64),
        ("a4", pl.Int64),
        ("p_rank", pl.Int64),
        ("is_simple", pl.Int64),
        ("has_principal_polarization", pl.Int64),
        ("has_jacobian", pl.Int64),
    ]
    return pl.DataFrame(result, schema=schema)

In [11]:
df = query_av(5, 2)
df.write_csv(pathlib.Path.cwd() / "av_fq_isog_q5_g2.csv")

print(f"Total number of abelian varieties over F_5 of dimension 2: {len(df)}")
print(f"Number of simple abelian varieties: {df['is_simple'].sum()}")
print("Number of abelian varieties by p-rank:")
print(df.group_by("p_rank").count().sort("p_rank"))
print("Number of abelian varieties by principal polarizability:")
print(df.group_by("has_principal_polarization").count().sort("has_principal_polarization"))
print("Number of abelian varieties by Jacobian:")
print(df.group_by("has_jacobian").count().sort("has_jacobian"))
print(f"Number of abelian varieties by a_3: {df.group_by('a3').count().sort('a3')}")

129it [00:00, 192.85it/s]

Total number of abelian varieties over F_5 of dimension 2: 129
Number of simple abelian varieties: 84
Number of abelian varieties by p-rank:
shape: (3, 2)
┌────────┬───────┐
│ p_rank ┆ count │
│ ---    ┆ ---   │
│ i64    ┆ u32   │
╞════════╪═══════╡
│ 0      ┆ 7     │
│ 1      ┆ 20    │
│ 2      ┆ 102   │
└────────┴───────┘
Number of abelian varieties by principal polarizability:
shape: (2, 2)
┌────────────────────────────┬───────┐
│ has_principal_polarization ┆ count │
│ ---                        ┆ ---   │
│ i64                        ┆ u32   │
╞════════════════════════════╪═══════╡
│ -1                         ┆ 2     │
│ 1                          ┆ 127   │
└────────────────────────────┴───────┘
Number of abelian varieties by Jacobian:
shape: (2, 2)
┌──────────────┬───────┐
│ has_jacobian ┆ count │
│ ---          ┆ ---   │
│ i64          ┆ u32   │
╞══════════════╪═══════╡
│ -1           ┆ 14    │
│ 1            ┆ 115   │
└──────────────┴───────┘
Number of abelian varieties by a_3: 




Bonus questions.

- Build a machine learning model that predicts parity of Artin representations from Frobenius traces.
- For a fixed $p$, build a machine learning model that predicts $a_{p}$ of Artin representations from all $a_{p'}$ with $p' < p$.
- Build a machine learning model that predicts $p$-rank from $L$-polynomials.
- Build a machine learning model that predicts polarizability from $L$-polynomials.
- Try to query other data. [This](https://www.lmfdb.org/api/) and [this](https://github.com/roed314/psycodict/blob/master/QueryLanguage.md) pages might be helpful.