Skip to content

Commit

Permalink
WEBDEV-5328 Integrate search service with beta search backend (#16)
Browse files Browse the repository at this point in the history
* Add new backends

* Remove fetchMetadata

* Add hit types

* Clean up minor doc issues

* Reinstate fetchMetadata for sake of testing

* Add schema for search hits

* Default to new metadata search

* Adjust response modeling to conform to new backend

* Toy with decoupling SearchService from its backends

* Clean up imports

* Abstract out shared search backend methods/options

* Clean up type imports

* Improve documentation

* Add hit factory and merge type for hits

* Organize hit properties

* Conform search response details to PPS shape

* Fix Hit type alias quirks

* Add mediatype field to text hits

* Update mock response factory

* Add search request model

* Document search backends

* Improve type safety of search response details

* Excise metadata fetching (now in separate service)

* Update search request params

* Update demo app with search type options

* Remove leftover fetchMetadata bits from tests

* Collapse responses directory

* Fix tests and formatting

* Add additional tests for new backends

* Add unit tests for search response details

* Avoid blank titles in demo

* Better organize backend tests

* Rename default-search-backend and update package exports

* Add hit-type tests and fix boolean fields

* Update sort params to PPS expected format

* Toggle unhelpful eslint rules

* Update README and version

* Further clarify usage in README

* Add sort options to demo

* Address eslint complaints

* Remove old advanced_search backend

* Add 'omit' option to aggregation params

* Fix demo app sort button

* Formatting: missed a semicolon

* Memoize search backends to avoid redundancy

* Include page_type and page_target params

* Better document search params

* Make new params optional, fix formatting

* Update URL param keys to PPS expectations

* Add tests for page_type and page_target params

* Add unit test for omitting aggregations

* Add additional unit tests for hit types

* Add aggregations to demo app

* Formatting

* Update obsolete documentation to refer to PPS

* Add FTS snippets to demo app

* Fix bug disallowing falsey search params

* Update obsolete documentation to refer to PPS

* Formatting

* Clean up snippet return types in demo app

* Add a demo app field for setting # rows to fetch

* Add aggregations_size param to requests and demo app

* Remove extra .ts extension on base-search-backend

* Move backend factory method onto SearchService

* Rename SearchBackendOptions(Interface)

* Prevent empty page_type and page_target

* Better document search parameters

* Clarify doc wording around search URL generation

* Remove empty search service constructor

* Move backend factory tests over to search service

* Add more documentation & clarifications

* Remove unavailable search types from the enum

* Rename some hit-related types/properties for clarity

* Simplify result type definition

* Clarify test descriptions

* Ensure error is not thrown for invalid hit types

* Improve search param docs

* Add more documentation around search params & schemas

* Improve documentation for search service interface

* Improve documentation for text/item hits

* Better document aggregations_size param

* Format doc comments more prettily

* Add test for aggregations_size

* Move result factory method to SearchResponseDetails

* Refer to Metadata types in hit models

* Revert rawMetadata to explicit type

* Clean up type imports

* Add a breakdown of the search params to the README

* Normalize some search response properties to camel case

* Add README sections documenting search params and responses

* Adjust README for clarity

* Rename Result to SearchResult to avoid overlap with Result<T, E>

* Formatting

* Add sorting method for aggregations

* Make aggregations immutable

* Correctly construct SearchRequest object

* Specify aggregation options type

* Better document aggregation sort options

* Remove trailing whitespace

* Log debugging info if it is present on the response

* Add debugging checkbox to demo app

* Move demo app queries into private fields

* Make debug info logs easier to navigate

* Document logging method and avoid errors on missing fields

* Add unit tests for debugging output

* Export AggregationSortType

* Add unit test for backend options

* Export search backend options interface

* Add unit tests for service path param

* Update lit to modern version (for demo app)

* Better organized backend tests

* v0.4.0
  • Loading branch information
latonv committed Sep 22, 2022
1 parent bfbb495 commit 5720d85
Show file tree
Hide file tree
Showing 45 changed files with 2,657 additions and 988 deletions.
121 changes: 95 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Internet Archive Search Service

A service for searching and retrieving metadata from the Internet Archive.
A service for searching the Internet Archive.

## Installation
```bash
Expand All @@ -13,6 +13,7 @@ npm install @internetarchive/search-service
```ts
import {
SearchService,
SearchType,
SortParam,
SortDirection
} from '@internetarchive/search-service';
Expand All @@ -23,50 +24,118 @@ const params = {
query: 'collection:books AND title:(goody)',
sort: [dateSort],
rows: 25,
start: 0,
fields: ['identifier', 'collection', 'title', 'creator']
};

const result = await searchService.performSearch(params);
const result = await searchService.search(params, SearchType.METADATA);
if (result.success) {
const searchResponse = result.success;
searchResponse.response.numFound // => number
searchResponse.response.docs // => Metadata[] array
searchResponse.response.docs[0].identifier // => 'identifier-foo'
searchResponse.response.totalResults // => number -- total number of search results available to fetch
searchResponse.response.returnedCount // => number -- how many search results are included in this response
searchResponse.response.results // => Result[] array
searchResponse.response.results[0].identifier // => 'some-item-identifier'
searchResponse.response.results[0].title?.value // => 'some-item-title', or possibly undefined if no title exists on the item
}
```

### Fetch Metadata
Currently available search types are `SearchType.METADATA` and `SearchType.FULLTEXT`.

```ts
const metadataResponse: MetadataResponse = await searchService.fetchMetadata('some-identifier');
### Search parameters

metadataResponse.metadata.identifier // => 'some-identifier'
metadataResponse.metadata.collection.value // => 'some-collection'
metadataResponse.metadata.collection.values // => ['some-collection', 'another-collection', 'more-collections']
```
The `params` object passed as first argument to search calls can have the following properties:

## Metadata Values
#### `query`
The full search query, which may include Lucene syntax.

Internet Archive Metadata is expansive and nearly all metadata fields can be returned as either an array, string, or number.
#### `rows`
The maximum number of search results to be retrieved per page.

The Search Service handles all of the possible variations in data formats and converts them to their appropriate types. For instance on date fields, like `date`, it takes the string returned and converts it into a native javascript `Date` value. Similarly for duration-type fields, like `length`, it takes the duration, which can be seconds `324.34` or `hh:mm:ss.ms` and converts them to a `number` in seconds.
#### `page`
Which page of results to retrieve, beginning from page 1.
Each page is sized according to the `rows` parameter, so requesting `{ rows: 20, page: 3 }`
would retrieve results 41-60, etc.

There are parsers for several different field types, like `Number`, `String`, `Date`, and `Duration` and others can be added for other field types.
#### `fields`
An array of metadata field names that should be present on the returned search results.

See `src/models/metadata-fields/field-types.ts`
#### `sort`
An array of sorting parameters to apply to the results.
The first array element specifies the primary sort, the second element the secondary sort, and so on.
Each sorting parameter has the form
```js
{ field: string, direction: 'asc' | 'desc' }
```
where `field` is the name of the column to sort on (e.g., title) and `direction` is whether to sort ascending or descending.

#### `aggregations`
An object specifying which aggregations to retrieve with the query.
To retrieve no aggregations at all, this object should be `{ omit: true }`.
To retrieve aggregations for one or more keys, this object should resemble
```js
{ simpleParams: ['subject', 'creator', /*...*/] }
```

### Usage
To specify the number of buckets for individual aggregation types, the object
should instead use the `advancedParams` property, resembling
```js
{ advancedParams: [{ field: 'subject', size: 2 }, { field: 'creator', size: 4 }, /*...*/] }
```

```ts
metadata.collection.value // return just the first item of the `values` array, ie. 'my-collection'
metadata.collection.values // returns all values of the array, ie. ['my-collection', 'other-collection']
metadata.collection.rawValue // return the rawValue. This is useful for inspecting the raw response received.
However, these advanced aggregation parameters are not currently supported by the backend and may be removed at
a later date.

#### `aggregationsSize`
The number of buckets to be returned for all aggregation types.
This defaults to 6 (the number of facets displayed for each type in the search results sidebar),
but can be overridden using this parameter to retrieve more/fewer buckets as needed.

#### `pageType`
A string indicating what type of page this data is being requested for. The search backend may
use a different set of default parameters depending on the page type. This defaults to
`'search_results'`, and currently only supports `'search_results' | 'collection_details'`, with
more types to be added in the future.

#### `pageTarget`
Used in conjunction with `pageType: 'collection_details'` to specify the identifier of the collection
to retrieve results for.

### Search types

At present the only two types of search available are Metadata Search (`SearchType.METADATA`)
and Full Text Search (`SearchType.FULLTEXT`). This will eventually be extended to support other
types of search including TV captions and radio transcripts. Calls that do not specify a search
type will default to Metadata Search.

### Return values

Calls to `SearchService#search` will return a Promise that either resolves to a `SearchResponse`
object or rejects with a `SearchServiceError`.

`SearchResponse` objects are structured similar to this example:

```js
{
rawResponse: {/*...*/}, // The raw JSON fetched from the server
request: {
clientParameters: {/*...*/}, // The original client parameters sent with the request
finalizedParameters: {/*...*/} // The finalized request parameters as determined by the backend
},
responseHeader: {/*...*/}, // The header containing info about the response success/failure and processing time
response: {
totalResults: 12345, // The total number of search results matching the query
returnedCount: 50, // The number of search results returned in this response
results: [/*...*/], // The array of search results
aggregations: {/*...*/}, // A record mapping aggregation names to Aggregation objects
schema: {/*...*/} // The data schema to which the returned search results conform
}
}
```

metadata.date.value // return the date as a javascript `Date` object
### Fetch Metadata

metadata.length.value // return the length (duration) of the item as a number of seconds, can be in the format "hh:mm:ss" or decimal seconds
```
As of v0.4.0, metadata fetching has been moved to the
[iaux-metadata-service](https://github.com/internetarchive/iaux-metadata-service) package
and is no longer included as part of the Search Service.

# Development

Expand Down
Loading

0 comments on commit 5720d85

Please sign in to comment.