# Partition info loading speed updates

**Author**: Melissa DeLucchi

We know that loading some partition info via the `_metadata` file can be really slow, so this notebook shows an attempt to load a catalog without the faster `partition_info.csv` file generated, and how one could generate that file, to improve speed.

In [1]:
from hipscat.catalog import Catalog
import os
import tempfile
import shutil

tmp_dir = tempfile.TemporaryDirectory()
tmp_path = tmp_dir.name

catalog_path = "/data3/epyc/data3/hipscat/catalogs/neowise_yr8"
metadata_path = os.path.join(catalog_path, "_metadata")

shutil.copyfile(os.path.join(catalog_path, "catalog_info.json"), os.path.join(tmp_path, "catalog_info.json"))
shutil.copyfile(os.path.join(catalog_path, "_metadata"), os.path.join(tmp_path, "_metadata"))

'/tmp/tmpxlxa5zuf/_metadata'

Let's try loading the partition info, using only the `_metadata` file. The `PartitionInfo` API makes the decision about how to load it, but there's no `partition_info.csv` in the directory yet.

In [2]:
%%time

catalog = Catalog.read_from_hipscat(tmp_path)
len(catalog.get_healpix_pixels())



CPU times: user 10.4 s, sys: 3.54 s, total: 14 s
Wall time: 14 s


20010

That was pretty slow, but once we have the partition info loaded, we can save it to a CSV file easily enough.

Note that this API isn't ideal. We should consider:
- If the `PartitionInfo` was loaded via a `read_from_dir` call (as will be done when loading from the `Catalog.from_hipscat` call, then we know the target catalog directory. We could save this path as a member variable.
- In the `write_to_file` call, if the partition_info_file is empty, we can create a path based on the saved catalog directory. 

In [3]:
%%time

catalog.partition_info.write_to_file(partition_info_file = os.path.join(tmp_path, "partition_info.csv"))

CPU times: user 85.5 ms, sys: 13.3 ms, total: 98.8 ms
Wall time: 96.9 ms


Now that there's a `partition_info.csv` file, the `read_from_dir` method will prefer to use that file, and there is no longer a warning about slowness. And it's very fast!

In [4]:
%%time

catalog = Catalog.read_from_hipscat(tmp_path)
len(catalog.get_healpix_pixels())

CPU times: user 429 ms, sys: 15.9 ms, total: 445 ms
Wall time: 443 ms


20010

In [5]:
tmp_dir.cleanup()