This project provides an interactive visualization plot of based on Dash and the data were collected from http://blog.nogizaka46.com/
Further applications are still in progress.
python == 3.7.9
re == 2.2.1
requests == 2.24.0
bs4 == 4.9.3
fake_useragent == 0.1.11
pandas == 1.1.3
numpy == 1.19.2
json == 2.0.9
sqlalchemy == 1.3.20 (database usage)
dash == 1.17.0
python Visualize_server.py
The application will run on 8900 port by default. Interactive interface can be use after connecting to 127.0.0.1:8900. You should see similar image on your browser. (It is vital to keep the program running while using the application)
- --loadfrom, -l: Load blog data from csv, json or database. Default: "csv"
- --port, -p: The port the application runs on. Default:8900
You may display the line plot based on various features, generations or combinations of members.
python main.py
This program will crawl through all the blogs on Nogizaka official blog website and save all the contexts, images and features by default. If you don't have a mysql database, you may comment out line 25 and line 35 containing datamanager.addDataFrametoDataBase function.
- --mode, -m: "init" or "update". If "update" is chosen it will stop crawling after finding the data have already been saved based on date. (Won't update Number of Comments in csv, json and database) Default:"init"
- --all, -a: use this parameter if you want to update all blogs' Number of Comments and other features. If your database, csv or json have graduated members, you should always add this parameter.
- Author
- Title
- Date
- Number of Comments
- Number of Characters in Context
- Number of Images
- Context Path
- Generation
Where the contexts and images are saved