New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L5 searchable docs #18

Closed
wants to merge 31 commits into
from

Conversation

Projects
None yet
@mattstauffer
Contributor

mattstauffer commented Feb 22, 2015

This is not necessarily complete, but I want to PR it so you can see it and give some feedback. You can merge as-is, or give feedback and I'll keep working before we call this PR ready-to-merge--whichever.

Steps to use this:

Install ElasticSearch.

Homestead: https://bosnadev.com/2014/09/12/install-elasticsearch-on-laravel-homestead/
Forge: http://forgerecipes.com/recipes/29

Limit ElasticSearch RAM Usage

On my 1GB server I set it to 512MB, but you can choose what's best based on the usage.

  1. Edit /etc/default/elasticsearch and set ES_HEAPSIZE to 512m (or whatever you want)
  2. Restart ElasticSearch using sudo /etc/init.d/elasticsearch restart

Run php artisan docs:index

This relies on your resources/docs/* Markdown folders being available locally.

That's it!

Original UI:

image

Current UI:

image

It's also (currently) testable here: http://elastic.fiveminutegeekshow.com/docs/

Show outdated Hide outdated app/Commands/Docs/IndexAllDocuments.php
/**
* Create a new command instance.
*/
public function __construct()

This comment has been minimized.

@GrahamCampbell

GrahamCampbell Feb 22, 2015

Member

Don't need that. :)

@GrahamCampbell

GrahamCampbell Feb 22, 2015

Member

Don't need that. :)

This comment has been minimized.

@mattstauffer

mattstauffer Feb 22, 2015

Contributor

True! I just left the defaults around. Thanks graham! Will fix and push later

@mattstauffer

mattstauffer Feb 22, 2015

Contributor

True! I just left the defaults around. Thanks graham! Will fix and push later

Show outdated Hide outdated app/Commands/Docs/InitializeElasticSearchIndex.php
/**
* Create a new command instance.
*/
public function __construct()

This comment has been minimized.

@GrahamCampbell

GrahamCampbell Feb 22, 2015

Member

Don't need that. :)

@GrahamCampbell

GrahamCampbell Feb 22, 2015

Member

Don't need that. :)

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 22, 2015

Contributor

Note that there are some @todos in line that I'd like to fix but I'd also welcome input. And I don't care whether we choose to address them in this PR or a future one.

Contributor

mattstauffer commented Feb 22, 2015

Note that there are some @todos in line that I'd like to fix but I'd also welcome input. And I don't care whether we choose to address them in this PR or a future one.

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 23, 2015

Contributor

I want this, attach the results page please

Contributor

montogeek commented Feb 23, 2015

I want this, attach the results page please

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 23, 2015

Contributor

@montogeek I'm not sure what you're talking about?

Contributor

mattstauffer commented Feb 23, 2015

@montogeek I'm not sure what you're talking about?

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 23, 2015

Contributor

Realized I have no idea what power or load this ElasticSearch box will be under. Looking up whether there are options for load testing.

Contributor

mattstauffer commented Feb 23, 2015

Realized I have no idea what power or load this ElasticSearch box will be under. Looking up whether there are options for load testing.

@sepehr

This comment has been minimized.

Show comment
Hide comment
@sepehr

sepehr Feb 23, 2015

It's great!

sepehr commented Feb 23, 2015

It's great!

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 23, 2015

Contributor

@mattstauffer I mean, how the results page looks.

Contributor

montogeek commented Feb 23, 2015

@mattstauffer I mean, how the results page looks.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 23, 2015

Contributor

@montogeek There's a image embedded into the original PR that shows what the results page looks like.

Screenshot from above

Contributor

mattstauffer commented Feb 23, 2015

@montogeek There's a image embedded into the original PR that shows what the results page looks like.

Screenshot from above

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 23, 2015

Contributor

Oh sorry, I didn't see it.
The results looks pretty good, I think the user do not care about the score. just the result.
What are you thoughts?

Contributor

montogeek commented Feb 23, 2015

Oh sorry, I didn't see it.
The results looks pretty good, I think the user do not care about the score. just the result.
What are you thoughts?

@jackmcdade

This comment has been minimized.

Show comment
Hide comment
@jackmcdade

jackmcdade Feb 23, 2015

Contributor

I agree. It's logical that they're in order of relevance, the actual score doesn't mean a heck of a lot.

Contributor

jackmcdade commented Feb 23, 2015

I agree. It's logical that they're in order of relevance, the actual score doesn't mean a heck of a lot.

@spamoom

This comment has been minimized.

Show comment
Hide comment
@spamoom

spamoom Feb 23, 2015

I think we should bin the score

spamoom commented Feb 23, 2015

I think we should bin the score

@jackmcdade

This comment has been minimized.

Show comment
Hide comment
@jackmcdade

jackmcdade Feb 23, 2015

Contributor

Also, I'd use a smaller header, or change it to "Results for "$term". It's a really long headline.

Contributor

jackmcdade commented Feb 23, 2015

Also, I'd use a smaller header, or change it to "Results for "$term". It's a really long headline.

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 23, 2015

Contributor

What happen if the best match result is a subtitle or subsection of the page?

Contributor

montogeek commented Feb 23, 2015

What happen if the best match result is a subtitle or subsection of the page?

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Feb 23, 2015

Hi! Author of the ES-PHP library here. Just wanted to drop off some notes:

Re: scoring:
  • I would also consider removing the score. The normalization is good (much better than raw Lucene scores), but in practice users still tend to freak out about scores for no reason. :)
  • Related, users tend to gain more value from highlighted fragments than scores. Makes it easier to judge relevance.
Re: load testing:
  • Make sure you use "real" queries that users might actually search for. E.g. use a randomized sample of query text, some short, some long. Resource usage is highly dependent on query pattern, so its easy to make false benchmarks if you use non-realistic queries (e.g. only using one query text. Lucene will keep those data structures hot and you'll see better latency than a more realistic load test).
  • Test for both burst capacity as well as steady state queries-per-second. They have different demands on the system, and in particular, bursts are where you will likely run into trouble if you are using more memory hungry features. Your queriers are basic full-text, so it should be less of a problem for you.
  • Indexing load is probably pretty light, so I probably wouldn't bother including that in your tests
  • What kind of hardware resources is ES running on? Based on your query, you aren't doing much in the way of heavy memory usage (aggregations, sorting, etc), so if your server is crunched for memory, you could consider reducing the filter / field data cache sizes and reducing the overall heap. That'd move more memory to the OS page cache, which Lucene will happily use for speeding up basic full-text search.
  • Also, it's good to identify if you are optimizing for throughput, or latency. Latency can be addressed by adding more primary shards and more machines (which means more machines taking part in a single query). Throughput can be addressed by adding more replicas and more machines (which means more machines simultaneously working on concurrent queries).

All that said, your dataset is probably pretty small, so I would expect a moderate box to handle a reasonably high load well. Many of my tips are probably overkill :) Basic full-text search is very optimized and requires relatively few resources for a surprising amount of qps.

Hi! Author of the ES-PHP library here. Just wanted to drop off some notes:

Re: scoring:
  • I would also consider removing the score. The normalization is good (much better than raw Lucene scores), but in practice users still tend to freak out about scores for no reason. :)
  • Related, users tend to gain more value from highlighted fragments than scores. Makes it easier to judge relevance.
Re: load testing:
  • Make sure you use "real" queries that users might actually search for. E.g. use a randomized sample of query text, some short, some long. Resource usage is highly dependent on query pattern, so its easy to make false benchmarks if you use non-realistic queries (e.g. only using one query text. Lucene will keep those data structures hot and you'll see better latency than a more realistic load test).
  • Test for both burst capacity as well as steady state queries-per-second. They have different demands on the system, and in particular, bursts are where you will likely run into trouble if you are using more memory hungry features. Your queriers are basic full-text, so it should be less of a problem for you.
  • Indexing load is probably pretty light, so I probably wouldn't bother including that in your tests
  • What kind of hardware resources is ES running on? Based on your query, you aren't doing much in the way of heavy memory usage (aggregations, sorting, etc), so if your server is crunched for memory, you could consider reducing the filter / field data cache sizes and reducing the overall heap. That'd move more memory to the OS page cache, which Lucene will happily use for speeding up basic full-text search.
  • Also, it's good to identify if you are optimizing for throughput, or latency. Latency can be addressed by adding more primary shards and more machines (which means more machines taking part in a single query). Throughput can be addressed by adding more replicas and more machines (which means more machines simultaneously working on concurrent queries).

All that said, your dataset is probably pretty small, so I would expect a moderate box to handle a reasonably high load well. Many of my tips are probably overkill :) Basic full-text search is very optimized and requires relatively few resources for a surprising amount of qps.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 24, 2015

Contributor

I agree, I've never loved showing the score. I didn't know if we had a better way to show it, but I'm pretty happy to always defer to @jackmcdade :)

Update without the score, and with "Results":

image

In terms of displaying fragments, I had originally planned to do that work before PR-ing this, but I have very little experiencing breaking a document up into sub-sections. I had some ideas, but would love input: has anyone ever had experience breaking up a document so you can A) get more nuanced results and B) show the fragments in the response?

Contributor

mattstauffer commented Feb 24, 2015

I agree, I've never loved showing the score. I didn't know if we had a better way to show it, but I'm pretty happy to always defer to @jackmcdade :)

Update without the score, and with "Results":

image

In terms of displaying fragments, I had originally planned to do that work before PR-ing this, but I have very little experiencing breaking a document up into sub-sections. I had some ideas, but would love input: has anyone ever had experience breaking up a document so you can A) get more nuanced results and B) show the fragments in the response?

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 24, 2015

Contributor

Also: @polyfractal, you are a bawss. That is such helpful information--thanks! I have no idea what server infrastructure this is running on at the moment--it's whatever Taylor has Laravel.com running on. :)

Contributor

mattstauffer commented Feb 24, 2015

Also: @polyfractal, you are a bawss. That is such helpful information--thanks! I have no idea what server infrastructure this is running on at the moment--it's whatever Taylor has Laravel.com running on. :)

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 24, 2015

Contributor

Laravel.com is running on a Forge instance.

Great work @mattstauffer ! Another request, shows the query on the search input :).
And about the subsections, I think that is too much for now, maybe in a coming release!

Contributor

montogeek commented Feb 24, 2015

Laravel.com is running on a Forge instance.

Great work @mattstauffer ! Another request, shows the query on the search input :).
And about the subsections, I think that is too much for now, maybe in a coming release!

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 24, 2015

Contributor

@montogeek Yes, but Forge can run many different sizes of servers. So that only tells us the architecture, not the power.

I've been unsure whether or not it's wise to put the previous search in the query box.

And I agree, the subsections may be too much, which is why I put this up here. But if someone says, "That's easy, read this blog post", I'd be happy to go ahead.

Contributor

mattstauffer commented Feb 24, 2015

@montogeek Yes, but Forge can run many different sizes of servers. So that only tells us the architecture, not the power.

I've been unsure whether or not it's wise to put the previous search in the query box.

And I agree, the subsections may be too much, which is why I put this up here. But if someone says, "That's easy, read this blog post", I'd be happy to go ahead.

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Feb 24, 2015

Contributor

Right about Forge.
Is like Google, shows the user his input.

Contributor

montogeek commented Feb 24, 2015

Right about Forge.
Is like Google, shows the user his input.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 24, 2015

Contributor

Currently learning about ES Highlights. 👍

Contributor

mattstauffer commented Feb 24, 2015

Currently learning about ES Highlights. 👍

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 24, 2015

Contributor

OK, made it this far in ES highlights. Back at it again tomorrow.

image

Contributor

mattstauffer commented Feb 24, 2015

OK, made it this far in ES highlights. Back at it again tomorrow.

image

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Feb 24, 2015

@mattstauffer Happy to help, feel free to ping me if you have a question about syntax or relevance scoring or whatever. I'll leave the UI design to you all...front-end is not my forte by any means :)

Great work btw, really happy to see ES search being integrated into Laravel's site!

@mattstauffer Happy to help, feel free to ping me if you have a question about syntax or relevance scoring or whatever. I'll leave the UI design to you all...front-end is not my forte by any means :)

Great work btw, really happy to see ES search being integrated into Laravel's site!

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 25, 2015

Contributor

@polyfractal I'm struggling to get the fragments more useful. I'm trying to follow this to see if FVH will help, but feeling completely lost.

here's where I am:

image

As you can see, I'm really struggling to get the fragments useful. Any ideas?

Contributor

mattstauffer commented Feb 25, 2015

@polyfractal I'm struggling to get the fragments more useful. I'm trying to follow this to see if FVH will help, but feeling completely lost.

here's where I am:

image

As you can see, I'm really struggling to get the fragments useful. Any ideas?

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Feb 25, 2015

I don't think FVH will help (and in fact, I would recommend avoiding FVH...it has a proclivity for exploding with stackoverflow exceptions). It will also double your index size.

I think the real problem is that the fragments are a mix of Title + Body, and without formatting they just run into each other and it looks confusing. If you look at the first result, the highlighted fragments correlate to a title hit, a TOC hit and a body hit. But they are all smooshed inline with no formatting, so it's hard to tell what you are reading. E.g. here is the corresponding page:

laravel

If you look at something like Github's search, they maintain formatting which makes it easier to read (especially code fragments):

screen shot 2015-02-25 at 7 24 49 am

In contrast, Google only supplies a single fragment...but that fragment is always taken from the body of the text (and they try to avoid code snippets imo):

screen shot 2015-02-25 at 7 28 14 am

On the other hand, StackOverflow doesn't format code snippets, but they do appear to try and keep fragments from only the body (and make the highlighted section very dense and relatively large):

screen shot 2015-02-25 at 7 38 49 am

So I would personally try something like this (with full knowledge that it's easier to say than do :) ):

  • Only show highlighted fragments from the body. If there are no body highlights, just show the article title and the first XX characters from the doc's body (because it will be obvious the title matched)
  • Try to avoid highlighting TOC-style data...is this in a separate field or the body?
  • Either format code snippets, or try to parse them out so they don't display at all.
  • Maybe try only one fragment? Github shows two, but always in the same "code block" panel, while Google, SO and others only shows one.

Edit: re-reading my reply...this could be a decent amount of work, might be worth punting on highlighting until v2?

I don't think FVH will help (and in fact, I would recommend avoiding FVH...it has a proclivity for exploding with stackoverflow exceptions). It will also double your index size.

I think the real problem is that the fragments are a mix of Title + Body, and without formatting they just run into each other and it looks confusing. If you look at the first result, the highlighted fragments correlate to a title hit, a TOC hit and a body hit. But they are all smooshed inline with no formatting, so it's hard to tell what you are reading. E.g. here is the corresponding page:

laravel

If you look at something like Github's search, they maintain formatting which makes it easier to read (especially code fragments):

screen shot 2015-02-25 at 7 24 49 am

In contrast, Google only supplies a single fragment...but that fragment is always taken from the body of the text (and they try to avoid code snippets imo):

screen shot 2015-02-25 at 7 28 14 am

On the other hand, StackOverflow doesn't format code snippets, but they do appear to try and keep fragments from only the body (and make the highlighted section very dense and relatively large):

screen shot 2015-02-25 at 7 38 49 am

So I would personally try something like this (with full knowledge that it's easier to say than do :) ):

  • Only show highlighted fragments from the body. If there are no body highlights, just show the article title and the first XX characters from the doc's body (because it will be obvious the title matched)
  • Try to avoid highlighting TOC-style data...is this in a separate field or the body?
  • Either format code snippets, or try to parse them out so they don't display at all.
  • Maybe try only one fragment? Github shows two, but always in the same "code block" panel, while Google, SO and others only shows one.

Edit: re-reading my reply...this could be a decent amount of work, might be worth punting on highlighting until v2?

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 25, 2015

Contributor

@KaneCohen thanks--sadly it's not running. You can see the ES install script here: http://forgerecipes.com/recipes/29

It hasn't reset yet today...

Contributor

mattstauffer commented Feb 25, 2015

@KaneCohen thanks--sadly it's not running. You can see the ES install script here: http://forgerecipes.com/recipes/29

It hasn't reset yet today...

@KaneCohen

This comment has been minimized.

Show comment
Hide comment
@KaneCohen

KaneCohen Feb 25, 2015

What about syslog? Anything helpful?

I've had to deal with dying Es and it was all related to the memory.

Maybe set max memory consumption via environment variables

export ES_MAX_MEM=256m

Just an example, set men use to something like a half of your RAM.

What about syslog? Anything helpful?

I've had to deal with dying Es and it was all related to the memory.

Maybe set max memory consumption via environment variables

export ES_MAX_MEM=256m

Just an example, set men use to something like a half of your RAM.

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Feb 26, 2015

ES_HEAPSIZE can be used too...it sets both the min and max heap size to the same value (which is recommended).

If something is exploding in ES, the elasticsearch logs might have a sign too (e.g. OutOfMemoryException). Another common problem on new installations is running out of file descriptors (it'll start spamming your logs when you run out of FDs)

ES_HEAPSIZE can be used too...it sets both the min and max heap size to the same value (which is recommended).

If something is exploding in ES, the elasticsearch logs might have a sign too (e.g. OutOfMemoryException). Another common problem on new installations is running out of file descriptors (it'll start spamming your logs when you run out of FDs)

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 26, 2015

Contributor

@KaneCohen nothing in Syslog.. skimmed and also grepped elastic and couldn't find anything. But I can't say that I knew exactly what to look for.

I'll give it a day or two and see if I can see the thing happen; if so, I might need to do some load testing or something, and I'll try both the ES_MAX_MEM and also the ES_HEAPSIZE @polyfractal suggested. Hardest part is that I don't know how to trigger it yet.

Thanks guys.

Also, no problems in the elasticsearch logs.

Contributor

mattstauffer commented Feb 26, 2015

@KaneCohen nothing in Syslog.. skimmed and also grepped elastic and couldn't find anything. But I can't say that I knew exactly what to look for.

I'll give it a day or two and see if I can see the thing happen; if so, I might need to do some load testing or something, and I'll try both the ES_MAX_MEM and also the ES_HEAPSIZE @polyfractal suggested. Hardest part is that I don't know how to trigger it yet.

Thanks guys.

Also, no problems in the elasticsearch logs.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 26, 2015

Contributor

OK, @KaneCohen and @polyfractal were right: After a few days of running, Java runs out of memory (on my 512MB or 1GB box). Errors: https://gist.github.com/mattstauffer/59e74d0a6b57c7999689

I'll set the ES_HEAPSIZE to half of the box's power and then let it run for a few days. I wish I knew exactly what was causing it to grow....

Contributor

mattstauffer commented Feb 26, 2015

OK, @KaneCohen and @polyfractal were right: After a few days of running, Java runs out of memory (on my 512MB or 1GB box). Errors: https://gist.github.com/mattstauffer/59e74d0a6b57c7999689

I'll set the ES_HEAPSIZE to half of the box's power and then let it run for a few days. I wish I knew exactly what was causing it to grow....

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Feb 26, 2015

If you gist up curl -XGET localhost:9200/_nodes/stats after ES has been running for a while, I can take a look to see what's eating up mem. Your queries should be pretty memory lightweight, since they are just full-text search. Is the query in this PR the only query being run?

The default ES_HEAP_SIZE is set to 1gb, so if you were running on 512 or 1gb I imagine the JVM just freaked out at some point.

If you gist up curl -XGET localhost:9200/_nodes/stats after ES has been running for a while, I can take a look to see what's eating up mem. Your queries should be pretty memory lightweight, since they are just full-text search. Is the query in this PR the only query being run?

The default ES_HEAP_SIZE is set to 1gb, so if you were running on 512 or 1gb I imagine the JVM just freaked out at some point.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 27, 2015

Contributor

@polyfractal Yah, there's almost no queries, and no cron, and no one's really using this site. BUT I just looked.. it's a 512MB box.. so that's probably it. I'll probably dump it and re-spin-up a 1GB box. Thanks--and I'll also paste some stats. Thanks for all your help!

Contributor

mattstauffer commented Feb 27, 2015

@polyfractal Yah, there's almost no queries, and no cron, and no one's really using this site. BUT I just looked.. it's a 512MB box.. so that's probably it. I'll probably dump it and re-spin-up a 1GB box. Thanks--and I'll also paste some stats. Thanks for all your help!

@KaneCohen

This comment has been minimized.

Show comment
Hide comment
@KaneCohen

KaneCohen Feb 27, 2015

Matt, just in case, set max memory usage to 512 on new box. Given chance Es would eat whole 1g of available memory (default value for max memory).

Matt, just in case, set max memory usage to 512 on new box. Given chance Es would eat whole 1g of available memory (default value for max memory).

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Feb 28, 2015

Contributor

OK, new server, 1GB, edited /etc/default/elasticsearch and set ES_HEAPSIZE to 512m.

Then restarted ES with sudo /etc/init.d/elasticsearch restart.

Will let it run for a few days, and try to check the stats command. Checking in in a few.

Contributor

mattstauffer commented Feb 28, 2015

OK, new server, 1GB, edited /etc/default/elasticsearch and set ES_HEAPSIZE to 512m.

Then restarted ES with sudo /etc/init.d/elasticsearch restart.

Will let it run for a few days, and try to check the stats command. Checking in in a few.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
Contributor

mattstauffer commented Feb 28, 2015

@jasonmccallister

This comment has been minimized.

Show comment
Hide comment
@jasonmccallister

jasonmccallister Mar 3, 2015

This is fantastic! Would be a welcome feature to the docs IMHO.

This is fantastic! Would be a welcome feature to the docs IMHO.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 4, 2015

Contributor

OK, it's been 4 days and it's still running smoothly. That was it--just limiting the RAM usage. Updating the directions in the PR description and gonna call this ready. Thanks so much @KaneCohen and @polyfractal!

@taylorotwell: This beast is finally ready. 👏

Contributor

mattstauffer commented Mar 4, 2015

OK, it's been 4 days and it's still running smoothly. That was it--just limiting the RAM usage. Updating the directions in the PR description and gonna call this ready. Thanks so much @KaneCohen and @polyfractal!

@taylorotwell: This beast is finally ready. 👏

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 4, 2015

Contributor

@themccallister Thanks Jason! :)

Contributor

mattstauffer commented Mar 4, 2015

@themccallister Thanks Jason! :)

Extract Search Hit functionality to a class, and extract search CSS t…
…o Sass.

Update ellipses to only append and prepend to fragments when fragment
isn't at the end/beginning of the body.
@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 8, 2015

Contributor

OK, with that commit I'm actually done. 👍

Contributor

mattstauffer commented Mar 8, 2015

OK, with that commit I'm actually done. 👍

@GrahamCampbell

This comment has been minimized.

Show comment
Hide comment
@GrahamCampbell

GrahamCampbell Mar 8, 2015

lol, not the same at all :)

lol, not the same at all :)

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 8, 2015

Owner

HA, yah, whoops. :) Spelling is difficult some times. ;)

Owner

mattstauffer replied Mar 8, 2015

HA, yah, whoops. :) Spelling is difficult some times. ;)

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 8, 2015

Owner

This feels dirty. Do I have to do some magic to expose this private property for testing? Or make a method just for exposing it publicly? All options feel janky.

@adamwathan you are my testing guru unicorn, please teach me your ways

This feels dirty. Do I have to do some magic to expose this private property for testing? Or make a method just for exposing it publicly? All options feel janky.

@adamwathan you are my testing guru unicorn, please teach me your ways

This comment has been minimized.

Show comment
Hide comment
@adamwathan

adamwathan Mar 8, 2015

@mattstauffer I would probably just keep it private and test against the hardcoded string honestly.

There's some value in getting it from the Hit class to reduce maintenance costs if that string changes frequently, but there's also a case to be made for being explicit in the test, and also thinking of it as a sort of blackbox operation.

We do want to know that it starts with <span class="search-ellipsis">...</span>, rather than starts with whatever the Hit says it's supposed to start with. If you added a getEllipsisMarkup method or something and somehow there was a bug in that method itself, your tests might pass while the results contained different ellipsis markup than you were expecting.

So I think using the string directly in the tests and keeping the property private is the way to go, even though it introduces a bit of conceptual duplication. I don't think duplication in tests is an automatic evil until it starts costing you time in maintenance, because the value of being immediately clear and obvious and having everything that matters to the test contained within the one function is really valuable.

@mattstauffer I would probably just keep it private and test against the hardcoded string honestly.

There's some value in getting it from the Hit class to reduce maintenance costs if that string changes frequently, but there's also a case to be made for being explicit in the test, and also thinking of it as a sort of blackbox operation.

We do want to know that it starts with <span class="search-ellipsis">...</span>, rather than starts with whatever the Hit says it's supposed to start with. If you added a getEllipsisMarkup method or something and somehow there was a bug in that method itself, your tests might pass while the results contained different ellipsis markup than you were expecting.

So I think using the string directly in the tests and keeping the property private is the way to go, even though it introduces a bit of conceptual duplication. I don't think duplication in tests is an automatic evil until it starts costing you time in maintenance, because the value of being immediately clear and obvious and having everything that matters to the test contained within the one function is really valuable.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Mar 9, 2015

Contributor

Frustratingly, the indexer test echoes the "Indexing artisan.md" line mid-tests. Not sure how to get rid of that; googled around and no luck.

Contributor

mattstauffer commented Mar 9, 2015

Frustratingly, the indexer test echoes the "Indexing artisan.md" line mid-tests. Not sure how to get rid of that; googled around and no luck.

@jasonmccallister

This comment has been minimized.

Show comment
Hide comment
@jasonmccallister

jasonmccallister Apr 9, 2015

Pinging everyone about this pull request. @mattstauffer is that the last hang up? I might pull it down tomorrow and try my hand at it.

Pinging everyone about this pull request. @mattstauffer is that the last hang up? I might pull it down tomorrow and try my hand at it.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Apr 10, 2015

Contributor

@themccallister No hangups. I mean, sure, feel free to fix that, but the only thing remaining here is @taylorotwell's approval & merging when he has time and energy to get the server up and running.

Contributor

mattstauffer commented Apr 10, 2015

@themccallister No hangups. I mean, sure, feel free to fix that, but the only thing remaining here is @taylorotwell's approval & merging when he has time and energy to get the server up and running.

@montogeek

This comment has been minimized.

Show comment
Hide comment
@montogeek

montogeek Apr 22, 2015

Contributor

What happened with this?

Contributor

montogeek commented Apr 22, 2015

What happened with this?

@Craytor

This comment has been minimized.

Show comment
Hide comment
@Craytor

Craytor Apr 29, 2015

Why hasn't this been completed yet? I really think the docs needs this functionality.

Craytor commented Apr 29, 2015

Why hasn't this been completed yet? I really think the docs needs this functionality.

@Wiejeben

This comment has been minimized.

Show comment
Hide comment
@Wiejeben

Wiejeben May 7, 2015

+1, way better then having to go to Google all the time and ending up in older docs

Wiejeben commented May 7, 2015

+1, way better then having to go to Google all the time and ending up in older docs

@maxiloc maxiloc referenced this pull request May 27, 2015

Closed

Add documentation search #37

@taylorotwell

This comment has been minimized.

Show comment
Hide comment
@taylorotwell

taylorotwell Jun 6, 2015

Member

Closing this for now in light of Algolia search PR.

Member

taylorotwell commented Jun 6, 2015

Closing this for now in light of Algolia search PR.

@mattstauffer

This comment has been minimized.

Show comment
Hide comment
@mattstauffer

mattstauffer Jun 8, 2015

Contributor

For those who asked, the folks at Algolia took this pull request and then re-worked it to use Algolia search, which means A) Taylor doesn't have to maintain his own Elastic Search server, and B) we get a pretty UI for free. 👍 #37

Contributor

mattstauffer commented Jun 8, 2015

For those who asked, the folks at Algolia took this pull request and then re-worked it to use Algolia search, which means A) Taylor doesn't have to maintain his own Elastic Search server, and B) we get a pretty UI for free. 👍 #37

@tfevens

This comment has been minimized.

Show comment
Hide comment
@tfevens

tfevens Jun 8, 2015

#win-win 👍

tfevens commented Jun 8, 2015

#win-win 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment