Archive Profiler

Scripts to generate profiles of various Web archives that will be saved in Archive Profiles Repository.

Running Profiler Script

To setup and run the Profiler script, please follow these steps:

Clone the repository.

$ git clone git@github.com:oduwsdl/archive_profiler.git

Change working directory.

$ cd archive_profiler

Install dependencies from the requirement file (add sudo before pip command if necessary.)

$ pip install -r requirements.txt

Run the script on the shipped sample cdx files.

$ python ./main.py cdx/*.cdx

If the script finishes without errors, it should save the profiles in the profiles folder. Now please update the config.ini file to reflect your collection. Then try to run profiler against your own cdx file(s). This will generate profiles for your collection and will save them in the profiles directory (it will overwrite existing files with the same name).

$ python ./main.py path/to/cdx/files/*.cdx

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
benchmark		benchmark
cdx		cdx
json		json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_analyzer.py		benchmark_analyzer.py
cdx_extract_profiler.py		cdx_extract_profiler.py
cdx_profiler.py		cdx_profiler.py
cdx_transformation_analyze.r		cdx_transformation_analyze.r
config.ini		config.ini
configold.ini		configold.ini
extract_benchmark_analyzer.py		extract_benchmark_analyzer.py
extract_profiling_benchmarker.py		extract_profiling_benchmarker.py
extract_suburi_generator.py		extract_suburi_generator.py
extract_summarize.r		extract_summarize.r
key_generator.py		key_generator.py
keyword_sample_profiler.py		keyword_sample_profiler.py
lanl_profiler.py		lanl_profiler.py
main.py		main.py
mainold.py		mainold.py
newmain.py		newmain.py
policy_benchmark_analyzer.py		policy_benchmark_analyzer.py
policy_profiling_benchmarker.py		policy_profiling_benchmarker.py
policy_summarize.r		policy_summarize.r
prefix_suffix_profiler.py		prefix_suffix_profiler.py
profile.py		profile.py
profile_merger.py		profile_merger.py
profiling_benchmarker.py		profiling_benchmarker.py
requirements.txt		requirements.txt
suburi_generator.py		suburi_generator.py
summarize.r		summarize.r
test_suburi_generator.py		test_suburi_generator.py
uri_sample_profiler.py		uri_sample_profiler.py
urir_growth_analyze.r		urir_growth_analyze.r

License

oduwsdl/archive_profiler

Folders and files

Latest commit

History

Repository files navigation

Archive Profiler

Running Profiler Script

About

Resources

License

Stars

Watchers

Forks

Languages