Skip to content

Provides applications based on data from nogizaka members' blogs

License

Notifications You must be signed in to change notification settings

seanhsia/NogizakaBlog

Repository files navigation

NogizakaBlog

This project provides an interactive visualization plot of based on Dash and the data were collected from http://blog.nogizaka46.com/

Further applications are still in progress.

Package Version


python == 3.7.9
re == 2.2.1
requests == 2.24.0
bs4 == 4.9.3
fake_useragent == 0.1.11
pandas == 1.1.3
numpy == 1.19.2
json == 2.0.9
sqlalchemy == 1.3.20 (database usage)
dash == 1.17.0

Execution Process

Visualization

python Visualize_server.py

The application will run on 8900 port by default. Interactive interface can be use after connecting to 127.0.0.1:8900. You should see similar image on your browser. (It is vital to keep the program running while using the application) newplot (6)

args

  1. --loadfrom, -l: Load blog data from csv, json or database. Default: "csv"
  2. --port, -p: The port the application runs on. Default:8900

You may display the line plot based on various features, generations or combinations of members. newplot (5) newplot (7)

Crawling

python main.py

This program will crawl through all the blogs on Nogizaka official blog website and save all the contexts, images and features by default. If you don't have a mysql database, you may comment out line 25 and line 35 containing datamanager.addDataFrametoDataBase function.

args

  1. --mode, -m: "init" or "update". If "update" is chosen it will stop crawling after finding the data have already been saved based on date. (Won't update Number of Comments in csv, json and database) Default:"init"
  2. --all, -a: use this parameter if you want to update all blogs' Number of Comments and other features. If your database, csv or json have graduated members, you should always add this parameter.

features

  • Author
  • Title
  • Date
  • Number of Comments
  • Number of Characters in Context
  • Number of Images
  • Context Path
  • Generation

Files

blogs/

Where the contexts and images are saved

About

Provides applications based on data from nogizaka members' blogs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published