WEBDEV-5328 Integrate search service with beta search backend (#16)

* Add new backends * Remove fetchMetadata * Add hit types * Clean up minor doc issues * Reinstate fetchMetadata for sake of testing * Add schema for search hits * Default to new metadata search * Adjust response modeling to conform to new backend * Toy with decoupling SearchService from its backends * Clean up imports * Abstract out shared search backend methods/options * Clean up type imports * Improve documentation * Add hit factory and merge type for hits * Organize hit properties * Conform search response details to PPS shape * Fix Hit type alias quirks * Add mediatype field to text hits * Update mock response factory * Add search request model * Document search backends * Improve type safety of search response details * Excise metadata fetching (now in separate service) * Update search request params * Update demo app with search type options * Remove leftover fetchMetadata bits from tests * Collapse responses directory * Fix tests and formatting * Add additional tests for new backends * Add unit tests for search response details * Avoid blank titles in demo * Better organize backend tests * Rename default-search-backend and update package exports * Add hit-type tests and fix boolean fields * Update sort params to PPS expected format * Toggle unhelpful eslint rules * Update README and version * Further clarify usage in README * Add sort options to demo * Address eslint complaints * Remove old advanced_search backend * Add 'omit' option to aggregation params * Fix demo app sort button * Formatting: missed a semicolon * Memoize search backends to avoid redundancy * Include page_type and page_target params * Better document search params * Make new params optional, fix formatting * Update URL param keys to PPS expectations * Add tests for page_type and page_target params * Add unit test for omitting aggregations * Add additional unit tests for hit types * Add aggregations to demo app * Formatting * Update obsolete documentation to refer to PPS * Add FTS snippets to demo app * Fix bug disallowing falsey search params * Update obsolete documentation to refer to PPS * Formatting * Clean up snippet return types in demo app * Add a demo app field for setting # rows to fetch * Add aggregations_size param to requests and demo app * Remove extra .ts extension on base-search-backend * Move backend factory method onto SearchService * Rename SearchBackendOptions(Interface) * Prevent empty page_type and page_target * Better document search parameters * Clarify doc wording around search URL generation * Remove empty search service constructor * Move backend factory tests over to search service * Add more documentation & clarifications * Remove unavailable search types from the enum * Rename some hit-related types/properties for clarity * Simplify result type definition * Clarify test descriptions * Ensure error is not thrown for invalid hit types * Improve search param docs * Add more documentation around search params & schemas * Improve documentation for search service interface * Improve documentation for text/item hits * Better document aggregations_size param * Format doc comments more prettily * Add test for aggregations_size * Move result factory method to SearchResponseDetails * Refer to Metadata types in hit models * Revert rawMetadata to explicit type * Clean up type imports * Add a breakdown of the search params to the README * Normalize some search response properties to camel case * Add README sections documenting search params and responses * Adjust README for clarity * Rename Result to SearchResult to avoid overlap with Result<T, E> * Formatting * Add sorting method for aggregations * Make aggregations immutable * Correctly construct SearchRequest object * Specify aggregation options type * Better document aggregation sort options * Remove trailing whitespace * Log debugging info if it is present on the response * Add debugging checkbox to demo app * Move demo app queries into private fields * Make debug info logs easier to navigate * Document logging method and avoid errors on missing fields * Add unit tests for debugging output * Export AggregationSortType * Add unit test for backend options * Export search backend options interface * Add unit tests for service path param * Update lit to modern version (for demo app) * Better organized backend tests * v0.4.0
internetarchive · Sep 22, 2022 · 5720d85 · 5720d85
1 parent bfbb495
commit 5720d85
Show file tree

Hide file tree

Showing 45 changed files with 2,657 additions and 988 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Internet Archive Search Service
 
-A service for searching and retrieving metadata from the Internet Archive.
+A service for searching the Internet Archive.
 
 ## Installation
 ```bash
@@ -13,6 +13,7 @@ npm install @internetarchive/search-service
 ```ts
 import {
   SearchService,
+  SearchType,
   SortParam,
   SortDirection
 } from '@internetarchive/search-service';
@@ -23,50 +24,118 @@ const params = {
   query: 'collection:books AND title:(goody)',
   sort: [dateSort],
   rows: 25,
-  start: 0,
   fields: ['identifier', 'collection', 'title', 'creator']
 };
 
-const result = await searchService.performSearch(params);
+const result = await searchService.search(params, SearchType.METADATA);
 if (result.success) {
   const searchResponse = result.success;
-  searchResponse.response.numFound // => number
-  searchResponse.response.docs // => Metadata[] array
-  searchResponse.response.docs[0].identifier // => 'identifier-foo'
+  searchResponse.response.totalResults // => number -- total number of search results available to fetch
+  searchResponse.response.returnedCount // => number -- how many search results are included in this response
+  searchResponse.response.results // => Result[] array
+  searchResponse.response.results[0].identifier // => 'some-item-identifier'
+  searchResponse.response.results[0].title?.value // => 'some-item-title', or possibly undefined if no title exists on the item
 }
 ```
 
-### Fetch Metadata
+Currently available search types are `SearchType.METADATA` and `SearchType.FULLTEXT`.
 
-```ts
-const metadataResponse: MetadataResponse = await searchService.fetchMetadata('some-identifier');
+### Search parameters
 
-metadataResponse.metadata.identifier // => 'some-identifier'
-metadataResponse.metadata.collection.value // => 'some-collection'
-metadataResponse.metadata.collection.values // => ['some-collection', 'another-collection', 'more-collections']
-```
+The `params` object passed as first argument to search calls can have the following properties:
 
-## Metadata Values
+#### `query`
+The full search query, which may include Lucene syntax.
 
-Internet Archive Metadata is expansive and nearly all metadata fields can be returned as either an array, string, or number.
+#### `rows`
+The maximum number of search results to be retrieved per page.
 
-The Search Service handles all of the possible variations in data formats and converts them to their appropriate types. For instance on date fields, like `date`, it takes the string returned and converts it into a native javascript `Date` value. Similarly for duration-type fields, like `length`, it takes the duration, which can be seconds `324.34` or `hh:mm:ss.ms` and converts them to a `number` in seconds.
+#### `page`
+Which page of results to retrieve, beginning from page 1.
+Each page is sized according to the `rows` parameter, so requesting `{ rows: 20, page: 3 }`
+would retrieve results 41-60, etc.
 
-There are parsers for several different field types, like `Number`, `String`, `Date`, and `Duration` and others can be added for other field types.
+#### `fields`
+An array of metadata field names that should be present on the returned search results.
 
-See `src/models/metadata-fields/field-types.ts`
+#### `sort`
+An array of sorting parameters to apply to the results.
+The first array element specifies the primary sort, the second element the secondary sort, and so on.
+Each sorting parameter has the form 
+```js
+{ field: string, direction: 'asc' | 'desc' }
+```
+where `field` is the name of the column to sort on (e.g., title) and `direction` is whether to sort ascending or descending.
+
+#### `aggregations`
+An object specifying which aggregations to retrieve with the query.
+To retrieve no aggregations at all, this object should be `{ omit: true }`.
+To retrieve aggregations for one or more keys, this object should resemble 
+```js
+{ simpleParams: ['subject', 'creator', /*...*/] }
+```
 
-### Usage
+To specify the number of buckets for individual aggregation types, the object
+should instead use the `advancedParams` property, resembling
+```js
+{ advancedParams: [{ field: 'subject', size: 2 }, { field: 'creator', size: 4 }, /*...*/] }
+```
 
-```ts
-metadata.collection.value // return just the first item of the `values` array, ie. 'my-collection'
-metadata.collection.values // returns all values of the array, ie. ['my-collection', 'other-collection']
-metadata.collection.rawValue // return the rawValue. This is useful for inspecting the raw response received.
+However, these advanced aggregation parameters are not currently supported by the backend and may be removed at 
+a later date.
+
+#### `aggregationsSize`
+The number of buckets to be returned for all aggregation types.
+This defaults to 6 (the number of facets displayed for each type in the search results sidebar),
+but can be overridden using this parameter to retrieve more/fewer buckets as needed.
+
+#### `pageType`
+A string indicating what type of page this data is being requested for. The search backend may
+use a different set of default parameters depending on the page type. This defaults to
+`'search_results'`, and currently only supports `'search_results' | 'collection_details'`, with
+more types to be added in the future.
+
+#### `pageTarget`
+Used in conjunction with `pageType: 'collection_details'` to specify the identifier of the collection
+to retrieve results for.
+
+### Search types
+
+At present the only two types of search available are Metadata Search (`SearchType.METADATA`) 
+and Full Text Search (`SearchType.FULLTEXT`). This will eventually be extended to support other
+types of search including TV captions and radio transcripts. Calls that do not specify a search
+type will default to Metadata Search.
+
+### Return values
+
+Calls to `SearchService#search` will return a Promise that either resolves to a `SearchResponse`
+object or rejects with a `SearchServiceError`.
+
+`SearchResponse` objects are structured similar to this example:
+
+```js
+{
+  rawResponse: {/*...*/}, // The raw JSON fetched from the server
+  request: {
+    clientParameters: {/*...*/}, // The original client parameters sent with the request
+    finalizedParameters: {/*...*/} // The finalized request parameters as determined by the backend
+  },
+  responseHeader: {/*...*/}, // The header containing info about the response success/failure and processing time
+  response: {
+    totalResults: 12345, // The total number of search results matching the query
+    returnedCount: 50, // The number of search results returned in this response
+    results: [/*...*/], // The array of search results
+    aggregations: {/*...*/}, // A record mapping aggregation names to Aggregation objects
+    schema: {/*...*/} // The data schema to which the returned search results conform
+  }
+}
+```
 
-metadata.date.value  // return the date as a javascript `Date` object
+### Fetch Metadata
 
-metadata.length.value  // return the length (duration) of the item as a number of seconds, can be in the format "hh:mm:ss" or decimal seconds
-```
+As of v0.4.0, metadata fetching has been moved to the 
+[iaux-metadata-service](https://github.com/internetarchive/iaux-metadata-service) package
+and is no longer included as part of the Search Service.
 
 # Development