K-news keyword summarizer

프로젝트 설명

우리 프로젝트는 일주일 간의 기사를 크롤링 한 뒤, 카테고리별(정치, 경제, 사회)로 top keywords를 뽑아주고 기사를 요약합니다.

top keywords는 LDA와 TextRank 알고리즘을 결합해 활용합니다.

Django를 사용하여 크롤링한 뉴스의 총 개수와 각 카테고리에서 키워드별 뉴스 개수 차트를 시각화합니다.

또한 그 키워드 차트를 클릭하면 키워드에 맞는 뉴스들을 나열하며, 원본 url과 요약본을 보여줍니다.

프로젝트 실행 예시

Main Page

Category Politics

Category's Topic 5

Article Summary

Running project

Software environment

OS : UBUNTU Lts 20.04

Python version==3.8.5

Django version==3.2.4

If you have crontab error then check your OS first.

crontab does not support Windows. So you must use Linux OS to use crontab.

Also our project may not support python version under python3. We wish you use python version over 3.**

Before running our project

Django version 3.2.4 and mysql for viewing and saving data, BeautifulSoup4 and Goose3 for crawling and LDA in gensim and TextRank to extract keywords

We wish you to read our requirements.txt for installing packages

Installing Mecab

Use the line below at your terminal

$ bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)

After installing packages

you must create secrets.json and my_settings.py in same locate of manage.py In secrets.json, you must write SECRET_KEY. Like:

{
    "SECRET_KEY" : "your secret key"
}

In my_settings.py, you must write information of DATABASE. Like:

DATABASES = {
    'default': { 
        'ENGINE':'django.db.backends.mysql', # mysql engine
        'NAME':'oss', # database name
	'USER':'root', # user name when connected database
	'PASSWORD':'PASSWORD',# user password
        'HOST':'127.0.0.1', # database server address
        'PORT':'3306' # database server port
    }
}

After setting these two files, now you have to do migrate

 $ python3 manage.py makemigrations

 $ python3 manage.py migrate

If you finish these migrate without errors then run server

 $ python3 manage.py runserver

Now you can use the website. But you may not have any news data and keywords.

Our project use django-crontab for crawling and keyword extractor job at particular time everyday. Our project do crawling at 00:00 and keyword extract at 01:00. Of course, you can change time by modifiy settings.py.

CRONJOBS = [
    ('0 0 * * *', 'crawling.cron.article_crawling_job', '>> log file location'),
    ('0 1 * * *', 'keywords.cron.lda_job', '>> log file location'),
]

change here and you can run program when you want. But don't pull pull-request with changing time

Keyword Extractor

LDA and TextRank algorithm combined README_keyword.md

Summarizer

TextRank algorithm implemented README_summarize.md

Contribution guidelines

IF you want to contribute to our project, be sure to review the contribution guidelines. This project adheres to code_of_conduct. By participating, we are expected to read these two md.

We use GitHub issues for tracking requests, bugs, and enhance our project. So if you have an issue of project, then make and submit new issue.

Contributors list

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
crawling		crawling
keywords		keywords
oss_proj		oss_proj
scripts		scripts
static		static
summary		summary
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_keyword.md		README_keyword.md
README_summarize.md		README_summarize.md
manage.py		manage.py
nnp.csv		nnp.csv
requirements.txt		requirements.txt
user-nnp.csv		user-nnp.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-news keyword summarizer

프로젝트 설명

프로젝트 실행 예시

Main Page

Category Politics

Category's Topic 5

Article Summary

Running project

Software environment

Before running our project

Installing Mecab

After installing packages

Keyword Extractor

Summarizer

Contribution guidelines

Contributors list

License

Websites used in crawling

About

Releases

Packages

Contributors 4

Languages

License

ossteam8/K-news-keyword-summarizer

Folders and files

Latest commit

History

Repository files navigation

K-news keyword summarizer

프로젝트 설명

프로젝트 실행 예시

Main Page

Category Politics

Category's Topic 5

Article Summary

Running project

Software environment

Before running our project

Installing Mecab

After installing packages

Keyword Extractor

Summarizer

Contribution guidelines

Contributors list

License

Websites used in crawling

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages