Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit to build package #547

Closed
AliferSales opened this issue Apr 8, 2018 · 5 comments
Closed

Rate limit to build package #547

AliferSales opened this issue Apr 8, 2018 · 5 comments

Comments

@AliferSales
Copy link

I'm trying to build a 1.58 GB csv file, but I have the following result:

$ sudo quilt build alifersales/burj20141 build.yml 
Inferring 'transform: id' for README.md
Registering README.md...
Inferring 'transform: csv' for rj_bu_2014_1.csv
Serializing rj_bu_2014_1.csv...
  1%|| 9.18M/1.58G [00:00<00:27, 56.4MB/s]
Warning: failed fast parse on input rj_bu_2014_1.csv.
Switching to Python engine.
Killed

But I can't understand which is the problem and why the process was killed.

My file is:

$head rj_bu_2014_1.csv
"data_geracao","hora_geracao","codigo_pleito","codigo_eleicao","sigla_uf","codigo_cargo","descricao_cargo","numero_zona","numero_secao","numero_local","numero_partido","nome_partido","codigo_municipio","nome_municipio","data_bu_recebido","qtde_eleitores_aptos","qtde_eleitores_faltosos","qtde_eleitores_comparecimento","codigo_tipo_eleicao","codigo_tipo_urna","descricao_tipo_urna","numero_votavel","nome_votavel","qtde_votos","codigo_tipo_votavel","numero_urna_efetivada","codigo_carga_urna_1","codigo_carga_urna_2","data_carga_urna","codigo_flashcard","cargo_pergunta_secao"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","99","","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","96","NULO","17","3","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","99","","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","95","BRANCO","9","2","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","50","PSOL","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","50","LUCIANA GENRO","9","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","45","PSDB","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","45","A�CIO NEVES","74","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","13","PT","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","13","DILMA","76","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","40","PSB","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","40","MARINA SILVA","64","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","28","PRTB","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","28","LEVY FIDELIX","1","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","21","PCB","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","21","MAURO IASI","1","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"
"14-10-2014","13:55:55","157","143","RJ","1","PRESIDENTE","1","1","1295","20","PSC","60011","RIO DE JANEIRO","05-10-2014","426","173","253","1","1","APURADA","20","PASTOR EVERALDO","1","1","1474540","685.479.799.740.465.658.","223.454","22-09-2014","AC0F2D99","1 - 1"

I'm suspecting that it's because a rate limit to build, but it looks strange for me.

Can someone help me?

@kevinemoore
Copy link
Contributor

This looks like quilt build is failing to parse your csv. Build uses pandas read csv to read csv files into a DataFrame then serializes the DataFrame. You might need to pass some additional parameters to pandas, which you can do with a build file. Are you building from a buil file (e.g. build.yml)? Or, straight from a directory path?

@AliferSales
Copy link
Author

Yes. The build.yaml that I'm using is:

$cat build.yaml
contents:
  README:
    file: README.md
  rj_bu_2014_1:
    file: rj_bu_2014_1.csv

I can open the file with Pandas and the only non-default specification is that enconding is "ISO-8559-1".

However, I create the another file with head of my csv:

$head rj_bu_2014_1.csv >> new_file.csv

Thus, I tried to build a package from this new file and the error don't occurred. Because of that I asked about rate limit.

@akarve
Copy link
Member

akarve commented Apr 8, 2018

OK, temporary solution. Put transform: id underneath, and at the same level of indentation as, file: rj.... That will copy the file, and you can parse it in pandas.

There is probably a datetime or format/type change later in the full file that the Parquet serializer doesn't like. Later today I'll provide an upload link for the full file so we can see what's up. Thanks for reporting this.

@kevinemoore
Copy link
Contributor

@AliferSales, you might want to try adding the encoding as a kwarg like this:
rj_bu_2014_1:
file: rj_bu_2014_1.csv
kwargs:
encoding: ISO-8559-1

If that doesn't work, you can use transform: id as @akarve suggested, to build a package with the raw csv file. You can then push that package with quilt push --public alifersales/<packagename> so we'll be able to reproduce the error. Thanks!

@AliferSales
Copy link
Author

Solved!

Really, the problem was with the pandas parsing. I didn't know that parser is made by pandas. I just needed to explicit the encoding (in the case, ISO-8559-1) and Bingo, as @kevinemoore suggested!

Thank @kevinemoore and @akarve. Quiltdata is a very good idea and looks very practical. I'm begining to use it and I'm very excited.

:D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants