improve error handling by using `error-stack` crate #104

xffxff · 2023-06-14T13:21:52Z

What does this PR do?

This PR improves the errors generated while fetching the results from upstream search engines making them more readable and providing more context to the errors and also makes the logs more verbose when debug option is turned on as proposed in issue #83.

Why is this change important?

This change is essential as it improves the readability of errors and gives errors more context when debugging the app which allows a better in-depth understanding of the problem/issue/bug.

How to test this PR locally?

it can be tested by installing and running Websurfx as mentioned in the docs and on the Readme and by launching the browser and providing a query with a large page value.

http://127.0.0.1:8080/search?q=ls&page=10000000

Related issues

close #99 and close #83

Aditional Notes

Additionally, once this PR is merged, we can create a new task for each engine search using tokio::spawn. As pointed out by @neon-mmd in this comment, we were unable to use tokio::spawn, because the scraper::error::SelectorErrorKind error type returned by the result function does not implement Sync and Send. However, we have now addressed this issue by eliminating SelectorErrorKind and returning an UnexpectedError instead. Consequently, all errors returned by the result function are now thread-safe.⏎

src/engines/engine_models.rs

neon-mmd

Also, after testing your changes it seems that the errors are not getting logged on stdout when they occur so I would suggest making the following changes in order to get this fixed 🙂 .

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

//! Main module of the application
//!
//! This module contains the main function which handles the logging of the application to the
//! stdout and handles the command line arguments provided and launches the `websurfx` server.

use std::net::TcpListener;

use websurfx::{config_parser::parser::Config, run};

/// The function that launches the main server and registers all the routes of the website.
///
/// # Error
///
/// Returns an error if the port is being used by something else on the system and is not
/// available for being used for other applications.
#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Initialize the parsed config file.
    let config = Config::parse().unwrap();

    // Initializing logging middleware to log all types of logging events including errors
    // and info-level events.
    if config.logging || config.debug {
        env_logger::init();
    }

    log::info!("started server on port {}", config.port);

    let listener = TcpListener::bind((config.binding_ip_addr.clone(), config.port))?;

    run(listener, config)?.await
}

In file src/search_results_handler/aggregator.rs you will need to change the code to something like this (as shown below):

//! This module provides the functionality to scrape and gathers all the results from the upstream
//! search engines and then removes duplicate results.

use std::{collections::HashMap, time::Duration};

use rand::Rng;
use tokio::join;

use super::{
    aggregation_models::{RawSearchResult, SearchResult, SearchResults},
    user_agent::random_user_agent,
};

use crate::engines::{duckduckgo, searx};

/// A function that aggregates all the scraped results from the above upstream engines and
/// then removes duplicate results and if two results are found to be from two or more engines
/// then puts their names together to show the results are fetched from these upstream engines
/// and then removes all data from the HashMap and puts into a struct of all results aggregated
/// into a vector and also adds the query used into the struct this is neccessory because
/// otherwise the search bar in search remains empty if searched from the query url
///
/// # Example:
///
/// If you search from the url like `https://127.0.0.1/search?q=huston` then the search bar should
/// contain the word huston and not remain empty.
///
/// # Arguments
///
/// * `query` - Accepts a string to query with the above upstream search engines.
/// * `page` - Accepts an u32 page number.
/// * `random_delay` - Accepts a boolean value to add a random delay before making the request.
///
/// # Error
///
/// Returns an error a reqwest and scraping selector errors if any error occurs in the results
/// function in either `searx` or `duckduckgo` or both otherwise returns a `SearchResults struct`
/// containing appropriate values.
pub async fn aggregate(
    query: &str,
    page: u32,
    random_delay: bool,
    debug: bool,
) -> Result<SearchResults, Box<dyn std::error::Error>> {
    let user_agent: String = random_user_agent();
    let mut result_map: HashMap<String, RawSearchResult> = HashMap::new();

    // Add a random delay before making the request.
    if random_delay || !debug {
        let mut rng = rand::thread_rng();
        let delay_secs = rng.gen_range(1..10);
        std::thread::sleep(Duration::from_secs(delay_secs));
    }

    // fetch results from upstream search engines simultaneously/concurrently.
    let (ddg_map_results, searx_map_results) = join!(
        duckduckgo::results(query, page, &user_agent),
        searx::results(query, page, &user_agent)
    );

    let ddg_map_results: HashMap<String, RawSearchResult> = match ddg_map_results {
        Ok(result) => result,
        Err(e) => {
            if debug {
                // Log the verbose error with all the information if debug option has been enabled
               // in the config file.
                log::error!("\n{e:?}");
            }
            HashMap::new()
        }
    };
    let searx_map_results: HashMap<String, RawSearchResult> = match searx_map_results {
        Ok(result) => result,
        Err(e) => {
            if debug {
                // Log the verbose error with all the information if debug option has been enabled
               // in the config file.
                log::error!("\n{e:?}");
            }
            HashMap::new()
        }
    };

    result_map.extend(ddg_map_results);

    searx_map_results.into_iter().for_each(|(key, value)| {
        result_map
            .entry(key)
            .and_modify(|result| {
                result.add_engines(value.clone().engine());
            })
            .or_insert_with(|| -> RawSearchResult {
                RawSearchResult::new(
                    value.title.clone(),
                    value.visiting_url.clone(),
                    value.description.clone(),
                    value.engine.clone(),
                )
            });
    });

    Ok(SearchResults::new(
        result_map
            .into_iter()
            .map(|(key, value)| {
                SearchResult::new(
                    value.title,
                    value.visiting_url,
                    key,
                    value.description,
                    value.engine,
                )
            })
            .collect(),
        query.to_string(),
    ))
}

Now, you can test the app as usual, one way to test the error is to open your browser and provide a large page size to the query and see if it returns the expected results in the log.

http://127.0.0.1:8080/search?q=ls&page=10000000

Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>

src/search_results_handler/aggregator.rs

xffxff · 2023-06-14T22:47:32Z

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

What is the reason for modifying the code in src/bin/websurfx.rs?

neon-mmd · 2023-06-15T07:00:54Z

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

What is the reason for modifying the code in src/bin/websurfx.rs?

The reason to modify the code was to allow the env_logger to also include error-level messages as well in the logs on stdout which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages. Now, by modifying the code to not have any defaults and by initializing it directly we do not add any restrictions/constraints on the env_logger which allows it to log other levels of messages as well including info and error level messages 🙂 . What do you say ??

xffxff · 2023-06-15T08:30:16Z

which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages.

I think that setting the log level to info allows all logs with a level higher than info to be printed to the stdout. I have tested this locally, and it seems to be working as expected. @neon-mmd

neon-mmd · 2023-06-15T10:17:40Z

which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages.

I think that setting the log level to info allows all logs with a level higher than info to be printed to the stdout. I have tested this locally, and it seems to be working as expected. @neon-mmd

Ok, I see 👀 but just get things right I will just test it one last time and see if it works because you know we need to make sure that it works right on all systems including ours. So just give me a few seconds if everything is working as expected then I will approve this PR because then it will be all good and ready to be merged 🙂 , What do you say??

neon-mmd

Ok after testing the changes one last time 🙂 I found that indeed as you (@xffxff) suggested it works. It looks like it was a general misconfiguration of the system on my side and now since everything works as expected and the changes look perfect so I am approving this and I am merging this PR 🙂 Also, thanks for putting up this PR I really appreciate it 🙂 .

improve error handling by using error-stack crate

9a4cf94

xffxff force-pushed the error_stack branch from 362c519 to 9a4cf94 Compare June 14, 2023 13:23

xffxff marked this pull request as ready for review June 14, 2023 13:24

xffxff requested a review from neon-mmd June 14, 2023 13:24

neon-mmd requested changes Jun 14, 2023

View reviewed changes

src/engines/engine_models.rs Outdated Show resolved Hide resolved

neon-mmd mentioned this pull request Jun 14, 2023

Make logs more verbose when put in debug mode #83

Closed

neon-mmd requested changes Jun 14, 2023

View reviewed changes

neon-mmd added 💻 aspect: code 🟨 priority: medium ✨ goal: improvement 🔢 points: 5 labels Jun 14, 2023

xffxff and others added 2 commits June 15, 2023 06:27

log the error from engines

1ebf888

Update src/engines/engine_models.rs

b6faa09

Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>

xffxff commented Jun 14, 2023

View reviewed changes

src/search_results_handler/aggregator.rs Show resolved Hide resolved

make format happy

ebb9e9e

xffxff requested a review from neon-mmd June 14, 2023 22:49

neon-mmd approved these changes Jun 15, 2023

View reviewed changes

neon-mmd merged commit 42686b9 into neon-mmd:rolling Jun 15, 2023
2 checks passed

alamin655 removed the 🔢 points: 5 label Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve error handling by using `error-stack` crate #104

improve error handling by using `error-stack` crate #104

xffxff commented Jun 14, 2023 •

edited by neon-mmd

neon-mmd left a comment •

edited

xffxff commented Jun 14, 2023

neon-mmd commented Jun 15, 2023

xffxff commented Jun 15, 2023

neon-mmd commented Jun 15, 2023

neon-mmd left a comment •

edited

improve error handling by using error-stack crate #104

improve error handling by using error-stack crate #104

Conversation

xffxff commented Jun 14, 2023 • edited by neon-mmd

What does this PR do?

Why is this change important?

How to test this PR locally?

Related issues

Aditional Notes

neon-mmd left a comment • edited

Choose a reason for hiding this comment

xffxff commented Jun 14, 2023

neon-mmd commented Jun 15, 2023

xffxff commented Jun 15, 2023

neon-mmd commented Jun 15, 2023

neon-mmd left a comment • edited

Choose a reason for hiding this comment

improve error handling by using `error-stack` crate #104

improve error handling by using `error-stack` crate #104

xffxff commented Jun 14, 2023 •

edited by neon-mmd

neon-mmd left a comment •

edited

neon-mmd left a comment •

edited