# Fundos de Investimento: Documentos: Informe Diário

source: https://dados.cvm.gov.br/dataset/fi-doc-inf_diario

O INFORME DIÁRIO é um demonstrativo que contém as seguintes informações do fundo, relativas à data de competência:

- Valor total da carteira do fundo;
- Patrimônio líquido;
- Valor da cota;
- Captações realizadas no dia;
- Resgates pagos no dia;
- Número de cotistas

**Importante**: A partir de maio/2022, os arquivos de dados de Informe Diário de Fundos passarão a ser disponibilizados no formato csv compactado (zip).



### Import relevant packages

In [1]:
from datetime import datetime
from io import BytesIO
from typing import Optional, Union, List

import pandas as pd
import requests
from dateutil.relativedelta import relativedelta

from pyportela.models.DataResource import DataResource
from pyportela.services.CachedDownload import CachedDownload
from pyportela.utils import unzip_csv_to_df, download, digits_to_int

## Recursos de dados

Aqui vamos montar todas as urls relevantes para nossa base. 

Cada url vai ter uma data de expiração que indica se deve ser atualizada ou se nossa basse já está ok com ela.

In [2]:
resources: List[DataResource] = []
for year in range(2004, 2021):
    fileName = f"inf_diario_fi_{year}.zip"
    url = "https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/" + fileName
    resource = DataResource(dataset_id="br_gov_cvm", url=url, etag=fileName)
    resources.append(resource)
dt = datetime(2021, 1, 1)
end = datetime.now()
while dt < end:
    year = dt.year
    month = dt.month
    fileName = f"inf_diario_fi_{year}{month:02d}.zip"
    url = "https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/" + fileName
    resource = DataResource(dataset_id="br_gov_cvm", url=url, etag=fileName)
    resources.append(resource)
    dt = dt + relativedelta(months=1)
resources[-1].expires = True
resources[-1].expires_at = datetime.now() + relativedelta(hours=12)

### Check url expiration dates

In [3]:
ckan_url = "https://dados.cvm.gov.br/api/action/package_show?id=fi-doc-inf_diario"
ckan_url_res = requests.get(ckan_url).json()

In [4]:
# ckan_url_res

In [5]:
for resource in ckan_url_res["result"]["resources"]:
    url = resource["url"]
    found = next((item for item in resources if item.url == url), None)
    if found is None:
        continue
    found.expires_at = datetime.strptime(resource["last_modified"], "%Y-%m-%dT%H:%M:%S.%f")
    found.expires = True


In [6]:
# Uncomment the line below to see the resources that will be downloaded
# pd.DataFrame(f.model_dump() for f in resources)

## Parsing the Data

In [11]:
def to_df(zip_file: Union[str, BytesIO], etag: str) -> pd.DataFrame:
    """
    This method takes a zip file and returns a pandas DataFrame with the data
    contained in the csv files inside the zip file.
    """
    df = unzip_csv_to_df(zip_file, sep=";", dtype=str)
    df["CNPJ_FUNDO"] = df["CNPJ_FUNDO"].apply(digits_to_int)
    df["DT_COMPTC"] = df["DT_COMPTC"].apply(
        lambda x: datetime.strptime(x, "%Y-%m-%d").date()
    )
    for col in [
        "VL_TOTAL",
        "VL_QUOTA",
        "VL_PATRIM_LIQ",
        "CAPTC_DIA",
        "RESG_DIA",
        "NR_COTST",
    ]:
        df[col] = df[col].astype(float)
    if "TP_FUNDO" not in df.columns:
        df["TP_FUNDO"] = None
    col_names = {}
    for col in df.columns:
        col_names[col] = col.lower()
    df.rename(columns=col_names, inplace=True)
    df["etag"] = etag
    df["edate"] = datetime.utcnow()
    df.Name = "fi_doc_inf_diario"
    return df


#test_df = to_df(download(resources[0].url), resources[0].etag)
#test_df

In [12]:
#test_df2 = to_df(download(resources[-1].url), resources[-1].etag)
#test_df2

## Saving Data

In the next cells we'll save data to the warehouse.

In [1]:
from pyportela.models.CountryCode import CountryCode
from pyportela.models.Organization import Organization
from pyportela.models.OrganizationType import OrganizationType

organization = Organization(CountryCode.BR, OrganizationType.COM, "CVM", "Comissão de Valores Mobiliários (CVM)")

In [2]:
organization.model_dump_json()

'{"country":"BR","org_type":"COM","name":"CVM","title":"Comissão de Valores Mobiliários (CVM)","description":null}'

In [3]:
organization.get_id()

'br_com_cvm'

In [4]:
from pyportela.models.License import LICENSE_CC

dataset = Dataset(organization, LICENSE_CC, "fundos")
dataset.model_dump_json()

'{"organization":{"country":"BR","org_type":"COM","name":"CVM","title":"Comissão de Valores Mobiliários (CVM)","description":null},"license":{"name":"CC_BY_4","url":"https://creativecommons.org/licenses/by/4.0/","description":"Creative Commons Attribution 4.0 International"},"name":"fundos","description":null,"tags":null}'

In [5]:
dataset.get_id()

'br_com_cvm__fundos'