Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I wonder how do you manage to store the data #128

Closed
yuhong opened this issue Feb 6, 2024 · 6 comments
Closed

I wonder how do you manage to store the data #128

yuhong opened this issue Feb 6, 2024 · 6 comments

Comments

@yuhong
Copy link

yuhong commented Feb 6, 2024

"By default, we do store some usage statistics in order to improve the search results. Specifically the following information is stored for each search:"
I wonder how do you manage to store the data (using spinning rust for example)

@mikkeldenker
Copy link
Member

I am not sure I know exactly what you mean by 'spinning rust example' but I will do my best to provide som insights here.

If the user hasn't disabled it, we store the query text that was used for the search, a timestamp rounded down to nearest hour and which result (if any) that was clicked. We don't store anything that can tie the search back to you. All the data is also automatically deleted after 90 days. The data is stored in a scylla database which runs on a 4u server in a basement here in Copenhagen:

11zon_IMG_3063

We also have some bare metal servers at Hetzner in Frankfurt which are used for a self-hosted s3, crawling, indexing and a bit of search. We will probably move more of the infrastructure to bare metal nodes at Hetzner in the future, including the scylla database.

I'll close this issue here as there is no action for us to take, but if you have more questions please feel free to add them here.

@yuhong
Copy link
Author

yuhong commented Feb 6, 2024

I mean would you use spinning rust or SSDs for this data for example. I can't imagine storing data on thousands of searches per second would be very practical without spinning rust (even if it was just for 90 days).

@mikkeldenker
Copy link
Member

Oh! HDDs are fine. It's not something that's used live for each search, so it's okay that the speed is not as high. The search index needs to be stored on fast SSDs though.

@yuhong
Copy link
Author

yuhong commented Feb 6, 2024

Not the point though. You will notice Marginalia Search managed to run their servers without dealing with spinning rust. Keep in mind that spinning rust fail more often than SSDs.

@yuhong
Copy link
Author

yuhong commented Feb 8, 2024

"Well, currently we don't. We are bootstraped and trying to keep costs low. In the future we will have, clearly labelled, contextual ads based on your current search query and a subscription option without ads. Just to re-iterate; we will only use your current search to match ads and will never track you across searches."
I hope you won't have to resort to the CPU cost of serving ads.

@mikkeldenker
Copy link
Member

I answered your question in #132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants