# Autocomplete on table names

## The basics

With `load_table()` we can dynamically load `DataSets` and their `Schemas`. However, we still need to type the table name as a string, which can be cumbersome because of typos or because we forgot the exact name. The classes described here give you autocomplete on table names and hence alleviate these problems. To illustrate this, let's first generate some data.

In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

In [2]:
import pandas as pd

(
    spark.createDataFrame(
        pd.DataFrame(
            dict(
                name=["Jack", "John", "Jane"],
                age=[20, 30, 40],
            )
        )
    ).createOrReplaceTempView("person_table")
)

## Catalogs

Using `Catalogs`, we can get autocomplete on all tables that Spark has access to.

In [3]:
from typedspark import Catalogs

db = Catalogs(spark)

After running the above cell, we can use `db` to load our table. Notice that you'll get autocomplete here!

In [4]:
persons, Person = db.spark_catalog.default.person_table()

We can use the `DataSet` and `Schema` just as we would before.

In [5]:
persons.show()

+----+---+
|name|age|
+----+---+
|Jack| 20|
|John| 30|
|Jane| 40|
+----+---+



In [6]:
Person


from pyspark.sql.types import LongType, StringType

from typedspark import Column, Schema


class PersonTable(Schema):
    name: Column[StringType]
    age: Column[LongType]

## Databases

`Catalogs` is often the only class you need. But if loading all catalogs takes too long, or if you only want to use only one catalog anyway, you can use `Databases` instead. We can use `Databases(spark, catalog_name=...)` to specify which catalog we want to load. Or we can omit this parameter to load the default catalog (often `spark_catalog` or `hive_metastore`).

In [7]:
from typedspark import Databases

db = Databases(spark)

In [8]:
persons, Person = db.default.person_table()

## Database

Finally, if we just want to load the tables from a single database, we can use `Database`. Once again, we can either specify the database (through `Database(spark, db_name=...)`) or leave it blank to load the default database.

In [9]:
from typedspark import Database

db = Database(spark, db_name="default")

In [10]:
person, Person = db.person_table.load()