Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minerar os dados da proposta de cada candidato à presidente #4

Closed
llucasreis opened this issue Sep 3, 2018 · 3 comments
Closed

Minerar os dados da proposta de cada candidato à presidente #4

llucasreis opened this issue Sep 3, 2018 · 3 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@llucasreis
Copy link
Owner

As propostas de cada candidato estão disponíveis em um artigo web no site da globo. Deve-se minerar os dados de cada pdf dos candidatos e verificar a quantidade de citação de cada área (economia, saúde, tecnologia, etc).

@llucasreis llucasreis added the enhancement New feature or request label Sep 3, 2018
@llucasreis llucasreis added this to the setembro milestone Sep 3, 2018
@llucasreis llucasreis self-assigned this Sep 3, 2018
llucasreis added a commit that referenced this issue Sep 9, 2018
To gather data to create a candidates' dataset, an algorithm was developed to acess all candidate's proposal. The pdf's files were saved to serve as a data provider along with the algorithm.

See also: #4, #3
llucasreis added a commit that referenced this issue Sep 9, 2018
In order to create a candidate's dataset, it's necessary to collect/mining data about the candidates and then provide the data. This algorithm was developed only to provide the candidate's data to another code and then create the dataset. The algorithm is not completed yet, it's possible to have some changes.

See also: #4, #3
@llucasreis
Copy link
Owner Author

llucasreis commented Sep 21, 2018

update:

  • Verificar a possibilidade de utilizar TF/IDF para otimizar o grau de importância de cada candidato para cada área.

@llucasreis
Copy link
Owner Author

update:
A abordagem utilizando TF/IDF será realizado após a conclusão da issue #5

@llucasreis llucasreis modified the milestones: setembro, outubro Sep 27, 2018
llucasreis added a commit that referenced this issue Oct 29, 2018
Add a new code to create a dataset for content-based filtering, The tf-idf's metrics was implemented to improve candidate's result. 

See also: #4
@llucasreis
Copy link
Owner Author

Situação final:

Dois algoritmos foram desenvolvidos a partir da mineração das propostas do candidato. O primeiro retorna o dicionário do candidato com base na contagem de termos das áreas citadas, e o segundo retorna o dicionário do candidato com a métrica TF/IDF.

Será feito uma algoritmo de recomendação que receberá os dados de data_provider.py e data_provider_2.py e irá fazer a filtragem baseada em conteúdo para recomendar candidatos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant