Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Typesense cloud (poc) #731

Open
2 of 6 tasks
tnir opened this issue Jul 22, 2022 · 17 comments
Open
2 of 6 tasks

Evaluate Typesense cloud (poc) #731

tnir opened this issue Jul 22, 2022 · 17 comments
Labels
good first issue help wanted performance Issues/PRs that improve performance

Comments

@tnir
Copy link
Collaborator

tnir commented Jul 22, 2022

cf. #691

TODO / concerns

Backend / cost

  • Performance
  • Cost
  • Team on cloud

Frontend

DocSearch v3 is current while Typesense's one was still on v2: #691 (comment)

  • DocSearch v3-compatible Kit
  • DocSearch v3-compatible UI
  • Frontend lifecycle (consistent maintenance with the upstream)

Resources

@tnir tnir changed the title Evaluate Typesense cloud Evaluate Typesense cloud (poc) Jul 22, 2022
@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

Loopin @jasonbosco @deivid-rodriguez @simi 👋

@simi
Copy link
Member

simi commented Jul 22, 2022

First of all, what's problem with middleman-search? Are there any problems we do face, anything we can fix in there?

@simi
Copy link
Member

simi commented Jul 22, 2022

@jasonbosco would it be possible to provide some kind of sponsored account for RubyGems? We can make some attribution in the footer or at special page, but we would like to prevent branding of the search box itself.

@deivid-rodriguez
Copy link
Member

middleman-search has been abandoned, and needs updates (we're already using my fork from git at the moment). So the problem with it is that we would need to take over maintenance ourselves.

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

[...] what's problem [...]

Yes. The Bundler.io Website team needs to maintain our search.js and search_arrow.js by themselves with that abandoned middleman-search and its dependency of the legacy version lunr.js, which was already stated in the description in #691.

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

In my previous experiment on Typesense cloud even within the nearest region to me (without Search Delivery Network) this week, performance should be improved. @jasonbosco Per performance evaluation, can I use 8GB, 2vCPU (non-burst type) with Search Delivery Network for this purpose?

Cost would be like:

Cluster
$1.45 /hr
Works out to $1,044.00 /month

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

I would say I already completed the logo problem even in #706.

@simi
Copy link
Member

simi commented Jul 22, 2022

@tnir would you mind to open related issues at https://github.com/manastech/middleman-search? I can try to address them.

@deivid-rodriguez
Copy link
Member

I have to say I'm more and more convinced that a local search solution would be best, it feels overkill to use an external service to scrape such a simple site like ours. Should we take over middleman-search maintenance? It has always worked very well and as far as I understood, we only need manastech/middleman-search#29 and manastech/middleman-search#38 (and we are already using manastech/middleman-search#38).

@jasonbosco
Copy link

@jasonbosco would it be possible to provide some kind of sponsored account for RubyGems? We can make some attribution in the footer or at special page, but we would like to prevent branding of the search box itself.

@simi Happy to provide a sponsored account for Rubygems, as long as the cost is not too high (since we ourselves are a bootstrapped company).

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

We know that paying for search infrastructure is a cost not all open source projects can afford. That's why we decided to keep DocSearch free for everyone. All we ask in exchange is that you keep the "Search by Algolia" logo displayed next to the search results.
Source: https://docsearch.algolia.com/docs/docsearch-program/#how-much-does-it-cost


@jasonbosco Per performance evaluation, can I use 8GB, 2vCPU (non-burst type) with Search Delivery Network for this purpose?

@tnir I don't think you would need 8GB of RAM to index the content from bundler.io (assuming that's the scope of this issue).

It seems like the number of pages is ~50 (please correct me if I'm wrong), so you might be able to fit all of this and much more in 512MB of RAM. With a 5 region SDN, you're looking at this configuration: https://cloud.typesense.org/pricing?memory=0.5_gb&vcpu=2_vcpus_1_hr_burst_per_day&high_perf_disk=no&typesense_server_version=0.23.1&ha=yes&sdn=5_regions&regions=n_california%2Cohio%2Cfrankfurt%2Cmumbai%2Ctokyo

~$110 a month, plus bandwidth.

The number of concurrent searches per second will determine the amount of vCPU you need, but then with 5 nodes you actually get 5 * 2vCPUs per node = 10vCPUs total. And for your dataset, this should be sufficient as well.

The best way to determine RAM usage would be to run the typesense-docsearch-scraper against the site and index it into Typesense to observe memory usage.

If my estimates above hold good, happy to sponsor this cluster for you.

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

The number of the pages are 950-1000, but each of most is not huge. I just guess Burst-type on Typesense looks one of the reasons of slowness on Typesensse cluod. As I am not sure what kind of technology you use there at all, I defer you about vCPUs per Node 💪 .
As you said above, the search traffic would be very low I guess. Let me start with the minimal 0.5GB mem. (note that in my previous experiment with 1/3 data in volume of production, Typesense used 50-60MB in memory, so even if we improved indexing, 0.5GB mem would be enough.)

@jasonbosco Then can I ask you to launch a single cluster (0.5GB-mem HA-5SDN cluster (SDN in 2US, 1EU, 2APAC as you suggested)) in bundler-io project?

@simi
Copy link
Member

simi commented Jul 22, 2022

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

I'm afraid that's exactly what I'm trying to avoid and AFAIK there is no easy way to get paid account for us. Let's focus on middleman-search and its maintenance @tnir.

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

Oops, I did not read this. If so, I prefer #706 now...

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

No, again #706 completed all requirements you (and I) want, so it seems that https://bundler-site-tnir-algol-j2zyh5.herokuapp.com/ might be the best at this moment.

@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

Before considering if putting the logo, we need to check if search experience is good with Typesense cloud. Once @jasonbosco create (or allow me to create) a cluster, I do update #702 in minutes.

@tnir tnir added the performance Issues/PRs that improve performance label Jul 22, 2022
@jasonbosco
Copy link

@tnir I've added some credits to the bundler-io account on Typesense Cloud. If you switch to it, you should now be able to provision a 5-region SDN cluster with the config I mentioned above.

@hsbt
Copy link
Member

hsbt commented Jul 22, 2022

I agreed @simi's opinion.

SaaS for OSS project is difficult. I have a lot of experience that are abandoned repositories. Sometimes I got a unknown cost charge, migrate it to heroku or AWS, and maintain them.

At least, We should choose technical stack that can migrate the OSS altanatives.

@tnir tnir added this to the Architecture overhaul milestone Jul 23, 2022
@tnir tnir assigned tnir and unassigned jasonbosco and tnir Jul 23, 2022
@tnir tnir removed this from the Architecture overhaul H2 2022 milestone Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue help wanted performance Issues/PRs that improve performance
Projects
None yet
Development

No branches or pull requests

5 participants