New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for corruption on the Federal Budget #67
Comments
Awesome, @franklinbaldo! IMHO the steps to bring this in to our project would be something like these:
I think this is a great idea and should be put forward ; ) Do you feel like you need any help? |
I'm a Lawyer. I know nothing about code. So, I can't develop the tool. |
Worry not, my friend — we know how to handle that. Thanks for the awesome idea : ) |
Here's another source for the same data, but in API form (documentation). Unfortunately, information on the politician who made the amendment to the budget is not listed, as it was not available at the time the API was built (i.e. data is updated daily, but the fields made available have not been expanded upon for a long time). |
can i use any lib/framework in python to scrap the data from the websites sugested: ("buscar pelo CNPJ em sites como http://www.ieptb.com.br/), ações judiciais contra entidade (sites dos tribunais, jusbrasil), ações civis e criminais contra os dirigentes (sites de tribunais), doações de dirigentes a campanha do parlamentar ou do partido (TSE) e outras.") ?? or is obrigatory to use python-requests + any html parser ?? I take a look at (http://portal.convenios.gov.br/images/docs/CGSIS/csv/siconv_emenda.csv.zip) and the file have a number that i think is the CNPJ of the ONG.. to i think the first thing is get this number and "translate" it to the name of the ONG to make a better seach on differents search engines... |
@anarcoder no need to restrict yourself to a specific lib or framework unless your code ends up being unreproducible by other people. As we don't commit to We tend to prefer Python as it's already our stack, but I see no problem in using other general purpose and widely available languages that don't require any setup (Ruby, shell script, etc…). |
Hi, I am listing other datasets where we can find information about the reputation of NGOs and individuals. They should not appear in any of those lists:
|
Great list, @franklinbaldo — thanks for that! And just in case: collecting data on campaign donors is already a topic in #76 |
…brasil#67 - fetching emendas.csv and saving as data/amendments.xz - translating columns names to english - TODO: download/create columns documentation and fetch beneficiaries info in src/fetch_cnpj_info.py
Hi guys, first of all congratulations for the awesome work you're doing here! I created a script to fetch the emendas.csv file from SICONV. I forked the project and created a branch here. Basically it downloads the dataset and translate the variables. There is a simple notebook to show some records also.
Here we got a pdf document in Portuguese, I think I can create a html document with the translation.
This one I will need a little help because maybe will need some refactoring as the read_csv(name) method has the date part of reimbursements filename on it. I don't know what impact it could be at others scripts. I will try to work on the APIs from Banco Nacional de Mandados de Prisão do CNJ and Cadastro Nacional de Condenados por Improbidade Administrativa do CNJ as @franklinbaldo listed earlier. This one I think we could use to validate the CNPJs in reimbursements datasets also. |
Sorry… is @marcusrehm's notebook link working for you? I can't read it: curl -I https://github.com/datasciencebr/serenata-de-amor/issues/serenata-de-amor/develop/2016-12-12-marcusrehm-amendments.ipynb
HTTP/1.1 406 Not Acceptable BTW your contribution seems very good, looking forward to read the notebook ; ) |
@cuducos I think the link for this file is this https://github.com/marcusrehm/serenata-de-amor/blob/issue-67/develop/2016-12-12-marcusrehm-amendments.ipynb |
Rank of congresspersons and beneficiaries (cnpjs) with highest amounts of amendments and their values.
Thanks @baldoequeiroz ! @cuducos sorry for the wrong link... Actually I used a wrong link. The correct one is that you pointed.. I did some refactoring in the fetch script and in the notebook also, it was (still is) pretty simple, by now it was just to show the data I got. I will try to work on those items I listed in previous comment. |
Great notebook, great data collection @marcusrehm! Many thanks for that. I do believe a lot could be done with this data. Regarding editing |
Considering your comment on #167 I think I gots misunderstood, I'm sorry about that. Putting my suggestion in this topic, and @Irio's suggestion on that topic, the proper usage would be (dates here are merely placeholder, not real dates in our $ python src/fetch_cnpj_info.py data/2016-12-06-reimbursements.xz data/2016-12-11-amendments.py This would inform This way we can query for the full CNPJ data of all these companies ending up with a more complete Does that make sense? |
Yes @cuducos, it does make sense! My concerns were just that doing this way we need to force the columns holding the CNPJs in all files to be named "cnpj". So doing this we could change (or lose) the meaning of a certain column in a file, it could be out of context of a data model? But it's not a big problem, we could address it in the dataset's documentation. I'm gonna make the changes in order to Do you think it would be possible to store the |
Good point, but that could be addressed in the code: # not functional, just a example
cols = {'amendments': 'beneficiary', 'other_dataset': 'something_else}
cnpj_col = cols.get(base_file_name, 'cnpj') |
It is already. Scripts in |
@cuducos I pushed the files with modifications, now The only thing to consider is the questions of columns with CNPJ's. As we spoke earlier, it is using a dictionary for dataset / columns: So when a new dataset is added, in order to fetch its CNPJ's one should add the entry at |
About the dataset of this issue, I renamed the script to I thinking with theses datasets we can try to correlate the congressperson of the amendments (and their relatives) with beneficiaries and suppliers. Do you think it should be better put theses files in a specific folder like |
Sorry @cuducos ! I was talking about cnpj-info.xz. :) |
No need for that I guess. |
Hi Guys, I've made available the scripts and a simple analysis (notebook) regarding non-profit entities with agreements in execution that started after the the date entities become impended, but I would like some help in understanding if this reasoning is correct. @cuducos , @franklinbaldo when you have time could please review it? The notebook is this one. Please it is in WIP, so the analysis I'm talking goes until the notebook section Impeded Non-Profit Entities. Besides that I also made available the scripts to fetch datasets related to federal agreements and amendments and the registers of companies/persons that suffered some federal sanctions and can't celebrate any kind of contract with the government. Basically it is:
|
Hi @marcusrehm, Many thanks for the notebook. I took a while to go through it because right now we're focused on the CEAP thing. This is the most feasible way to deliver to our Catarse supporters in the following weeks, so this is my priority these days. However taking SICONV is very promising for the next steps of the project — so I reinforce my thank you: you're giving our first step in that direction an that's awesome. My utterly douchbag comment would be to try to make your code a a bit more readable. I'm not a PEP8 radical but sometimes you code is very difficult to parse in human brains IMHO. But please… don't let that douchbag part of the feedback get in our way. Your contribution is really good. It looks like an interesting material to raise attention of the press and also to embody official reports denouncing these cases. Your comments in the notebook make it easy for newbies to understand what's going on and to make sense of data. I overlooked the editions in the My only concern at this point is how to organize documentation. Serenata de Amor kind of grew up around CEAP and I'm not sure what's the best way to include documentation of SICONV etc. Maybe we need a more robust way to document what we're doing. By now maybe a hot fix would be to add What do you think about it? |
Hi @cuducos ! Glad to know that it will help Serenata de Amor going through next steps! In fact your comments about the code are relevant, I'll make the adjustments to make it more readable. I think it's because of notebook's display when I run it locally and when it is viewed on Github. About your concern regarding how to organize documentation and subjects (CEAP and Federal Budget), I had the same feeling about it while I was developing the issue. I think we could do as you said, create a section in But regardless the questions above, the 3 datasets with registers of companies with some kind of issue with the federal government could be used to point out or improve the suspicion of companies that appear in reimbursements of CEAP and/or any other future analysis. Maybe we can create another notebook and cross these datasets with the reimbursements. Happy New Year for you guys! :) |
Hi @cuducos ! I made the adjustments related to the layout of the notebooks and created a small section in These notebooks are available here and here. The code is available at issue-67 branch. |
It looks really good, many thanks for that. My last suggestions:
|
Yeah, it really makes sense.
Yes! :)
@cuducos I can upload them to the issue-67 branch. Is that ok? |
@marcusrehm We don't normally commit data. You can upload it to some file transfer service like WeTransfer and we will upload it to aws so it is available =) |
Many thanks @jtemporal ! ;) |
@cuducos , @jtemporal PR #185 created. I'll send the datasets later ok? |
@cuducos , @jtemporal The datasets are available at WeTransfer. The link to download is https://we.tl/G9I2WV4DGV. |
Many thanks, @marcusrehm! I'm gonna upload the datasets to our S3 and merge your PR soon ; ) All: I'm gonna close this loooong Issue as we have the datasets and an automatized what to get updated versions of data. But this is only the beginning. New ideas on how to use this data within analysis are still welcomed — feel free to open new Issues about these hypothesis and solutions |
Issue #67 - Looking for corruption on the Federal Budget
The Brazilian Constitution allows each parliamentary allocate a portion of the federal budget for a specific purpose. But there is a problem because the law also allowed the parliamentary indicate the institution (NGO, Association, Foundation, public agency) that will receive the money. This creates a major risk of embezzlement, if the money is intended to entities controlled by the Parliament itself.
The federal government publishes the list of entities that received funds in this way. This list indicates which entity received the money, what she should do and what was the congressman who was the author of the amendment. address http://portal.convenios.gov.br/images/docs/CGSIS/csv/siconv_emenda.csv.zip
We could build a tool to check the reputation of such entities. This information would indicate higher risk of corruption.
We can get the information about the reputation from various sources: protests because of debts for this CNPJ, jundiciais actions against authority (sites of the courts, JusBrasil), criminal actions against the leaders (courts sites), leaders of donations to campaign parliamentary (TSE) and others.
Português
A constituição brasileira permite que cada parlamentar destine uma parte do orçamento federal para um objetivo específico. Mas existe um problema porque a Lei também permite que o parlamentar indique a instituição (ONG, Associação, Fundação, órgão público) que irá receber esse dinheiro. Isso gera um grande risco de desvio de dinheiro, se o dinheiro for destinado a entidades controladas pelo próprio parlamentar.
O governo federal divulga a lista de entidades que receberam verbas dessa forma. Essa lista indica para qual entidade recebeu o dinheiro, o que ela deveria fazer e quem foi o parlamentar que foi autor da emenda. Endereço: http://portal.convenios.gov.br/images/docs/CGSIS/csv/siconv_emenda.csv.zip
Nós poderíamos construir uma ferramenta que verifique a reputação dessas entidades. Essa informação indicaria emendas com alto risco de corrupção.
Podemos obter as informações sobre a reputação a partir de várias fontes: protestos em razão de dívidas cíveis (buscar pelo CNPJ em sites como http://www.ieptb.com.br/), ações judiciais contra entidade (sites dos tribunais, jusbrasil), ações civis e criminais contra os dirigentes (sites de tribunais), doações de dirigentes a campanha do parlamentar ou do partido (TSE) e outras.
The text was updated successfully, but these errors were encountered: