Code for WSM 2020 Project: Chinese WestLaw system
- Python3
- numpy
- Django 3.0.6 (For web interface)
- pkuseg
- nltk
- fuzzywuzzy
Download the original data from the course website.
-
Put the scripts
utils/index_for_data1.py
andutils/index_for_data2.py
into the data folder, on the same position withzxgk/
for data1,info/
for data2. -
Remove the duplicated files, rename the
.json
files with successive integer and build inverted index:python index_for_data1.py python index_for_data2.py
-
Build inverted index and tf-idf dictionary.
-
Put the generated indexes and original data into folder
/data1/
and/data2/
.
We also provide the processed data (extraction code: izcl).