Archives of Nethys Search Engine

A configurable, generic search UI for Archives of Nethys Built with Elastic App Search's Reference UI.

Updating configuration

The project can be configured via a JSON config file.

You can easily control things like...

The Engine the UI runs against
Which fields are displayed
The filters that are used

If you would like to make configuration changes, there is no need to regenerate this app from your App Search Dashboard!

You can simply open up the engine.json file, update the options, and then restart this app.

Configuration options

The following is a complete list of options available for configuration in engine.json.

option	value type	required/optional	source
`engineName`	String	required	Found in your App Search Dashboard.
`endpointBase`	String	required*	(*) Elastic Enterprise Search deployment URL, example: "http://127.0.0.1:3002". Not required if using App Search on swiftype.com.
`hostIdentifier`	String	required*	(*) Only required if using App Search on swiftype.com.
`searchKey`	String	required	Found in your App Search Dashboard.
`searchFields`	Array[String]	required	A list of fields that will be searched with your search term.
`resultFields`	Array[String]	required	A list of fields that will be displayed within your results.
`querySuggestFields`	Array[String]	optional	A list of fields that will be searched and displayed as query suggestions.
`titleField`	String	optional	The field to display as the title in results.
`urlField`	String	optional	A field with a url to use as a link in results.
`sortFields`	Array[String]	optional	A list of fields that will be used for sort options.
`facets`	Array[String]	optional	A list of fields that will be available as "facet" filters. Read more about facets within the App Search documentation.

Deploy

npm run deploy

Where'd the data come from?

Originally, the data came from a scrape of the Archives of Nethys website, for Pathfinder - 2nd Edition. I scraped it using the Elastic App Search Crawler. When I scraped it, the Crawler was still in "beta", so I did a little extra cleanup to get the data formatted exactly as I wanted it.

First, I ran the crawler in local mode, with a config like:

max_indexed_links_count: 1000
max_extracted_links_count: 10000
max_url_length: 70
max_crawl_depth: 10
domain_allowlist:
  - https://2e.aonprd.com
seed_urls:
  - https://2e.aonprd.com/Actions.aspx
  - https://2e.aonprd.com/Afflictions.aspx
  - https://2e.aonprd.com/Ancestries.aspx
  - https://2e.aonprd.com/Archetypes.aspx
  - https://2e.aonprd.com/Backgrounds.aspx
  - https://2e.aonprd.com/Classes.aspx
  - https://2e.aonprd.com/Conditions.aspx
  - https://2e.aonprd.com/Creatures.aspx
  - https://2e.aonprd.com/Equipment.aspx?All=true
  - https://2e.aonprd.com/Feats.aspx
  - https://2e.aonprd.com/Hazards.aspx
  - https://2e.aonprd.com/Rules.aspx
  - https://2e.aonprd.com/Skills.aspx
  - https://2e.aonprd.com/SpellLists.aspx?Tradition=0
  - https://2e.aonprd.com/Traits.aspx
  - https://2e.aonprd.com/Rules.aspx?ID=1483
  - https://2e.aonprd.com/Rules.aspx?ID=748
  - https://2e.aonprd.com/Rules.aspx?ID=1587
  - https://2e.aonprd.com/Rules.aspx?ID=686
output_sink: file
output_dir: examples/output/aon

Next, I made a new dir:

mkdir examples/output/aon-cleaned/

Then, I cleaned the raw crawl JSON:

require 'json'
require 'nokogiri'

output_dir = 'examples/output/aon-cleaned/'
input_dir = 'examples/output/aon/'
Dir.foreach(input_dir) do |filename|
  next if filename == '.' or filename == '..'
  puts "Parsing #{filename}"
  json = File.read("#{input_dir}#{filename}")
  hsh = JSON.parse(json)
  parsed_data = Nokogiri::HTML.parse(hsh['content'])
  title = parsed_data.title.strip.gsub(' - Archives of Nethys: Pathfinder 2nd Edition Database','')
  categories = title.split(' - ')
  category = categories.size >= 2 ? categories[1] : nil
  sub_category = categories.size >= 3 ? categories[2] :nil
  common_keywords = ["Archives", "Nethys", "Wiki", "Archives of Nethys", "Pathfinder", "Official", "AoN", "AoNPRD", "PRD", "PFSRD", "2E", "2nd Edition"]
  keywords =  parsed_data.at('meta[name=keywords]')['content'].split(', ') - common_keywords
  description = parsed_data.at('meta[name=description]')['content']
  description = Nokogiri::HTML(description)&.text
  body_content = parsed_data.at_css('[id="ctl00_MainContent_DetailedOutput"]')&.text
  body_content = parsed_data.at_css('[id="ctl00_RadDrawer1_Content_MainContent_DetailedOutput"]')&.text unless body_content
  result = {
    :title => title,
    :category => category,
    :sub_category => sub_category,
    :keywords => keywords,
    :description => description,
    :body => body_content,
    :url => hsh['url']
  }
  output_json = JSON.pretty_generate(result)
  File.open("#{output_dir}#{filename}", 'w') { |file| file.write(output_json) }
  puts "Parsed #{filename}"
rescue StandardError => e
  puts "Failed to parse #{filename} because #{e.class}: #{e.message}"
end

Then, I used the App Search Ruby Client (gem install elastic-enterprise-search) to index the cleaned json

require 'elastic-enterprise-search'

host = <host>
key = <private key>

ent_client = Elastic::EnterpriseSearch::Client.new(host: host)
ent_client.app_search.http_auth = key
client = ent_client.app_search
engine_name = 'aon-non-crawl'
documents = []
batch = 1
output_dir = 'examples/output/aon-cleaned/'
batch_ids = {}
Dir.foreach(output_dir) do |filename|
  next if filename == '.' or filename == '..'
  document = JSON.parse(File.read("#{output_dir}#{filename}"))
  document[:id] = document.dup.slice('title', 'category', 'keywords', 'description', 'body').hash.to_s
  unless batch_ids.keys.include?(document[:id])
    documents << document
    batch_ids[document[:id]] = true
  end
  if documents.size == 100
    puts "indexing batch #{batch}"
    client.index_documents(engine_name, documents: documents)
    documents = []
    batch_ids = {}
    batch = batch + 1
  end
end
client.index_documents(engine_name, documents: documents)

If something went wrong, and we wanted to delete the engine contents without deleting the engine itself, we can run:

batch = 0
loop do
  batch += 1
  result = client.list_documents(engine_name)
  ids = result.body['results'].map{|it| it['id']}
  if ids.empty?
    break
  end
  puts "Deleting batch #{batch}"
  client.delete_documents(engine_name, document_ids: ids)
end

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
public		public
scripts		scripts
src		src
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
logo-app-search.png		logo-app-search.png
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

public

public

scripts

scripts

src

src

.gitignore

.gitignore

.nvmrc

.nvmrc

LICENSE.txt

LICENSE.txt

NOTICE.txt

NOTICE.txt

README.md

README.md

logo-app-search.png

logo-app-search.png

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Archives of Nethys Search Engine

Updating configuration

Configuration options

Deploy

Where'd the data come from?

License 📗

About

Releases

Packages

Languages

License

seanstory/aon-react-demo-ui

Folders and files

Latest commit

History

Repository files navigation

Updating configuration

Configuration options

Deploy

Where'd the data come from?

License 📗

About

Resources

License

Stars

Watchers

Forks

Languages