Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error handling by using error-stack crate #104

Merged
merged 4 commits into from
Jun 15, 2023

Conversation

xffxff
Copy link
Contributor

@xffxff xffxff commented Jun 14, 2023

What does this PR do?

This PR improves the errors generated while fetching the results from upstream search engines making them more readable and providing more context to the errors and also makes the logs more verbose when debug option is turned on as proposed in issue #83.

Why is this change important?

This change is essential as it improves the readability of errors and gives errors more context when debugging the app which allows a better in-depth understanding of the problem/issue/bug.

How to test this PR locally?

it can be tested by installing and running Websurfx as mentioned in the docs and on the Readme and by launching the browser and providing a query with a large page value.

http://127.0.0.1:8080/search?q=ls&page=10000000

Related issues

close #99 and close #83

Aditional Notes

Additionally, once this PR is merged, we can create a new task for each engine search using tokio::spawn. As pointed out by @neon-mmd in this comment, we were unable to use tokio::spawn, because the scraper::error::SelectorErrorKind error type returned by the result function does not implement Sync and Send. However, we have now addressed this issue by eliminating SelectorErrorKind and returning an UnexpectedError instead. Consequently, all errors returned by the result function are now thread-safe.⏎

@xffxff xffxff marked this pull request as ready for review June 14, 2023 13:24
@xffxff xffxff requested a review from neon-mmd June 14, 2023 13:24
src/engines/engine_models.rs Outdated Show resolved Hide resolved
Copy link
Owner

@neon-mmd neon-mmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, after testing your changes it seems that the errors are not getting logged on stdout when they occur so I would suggest making the following changes in order to get this fixed 🙂 .

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

//! Main module of the application
//!
//! This module contains the main function which handles the logging of the application to the
//! stdout and handles the command line arguments provided and launches the `websurfx` server.

use std::net::TcpListener;

use websurfx::{config_parser::parser::Config, run};

/// The function that launches the main server and registers all the routes of the website.
///
/// # Error
///
/// Returns an error if the port is being used by something else on the system and is not
/// available for being used for other applications.
#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Initialize the parsed config file.
    let config = Config::parse().unwrap();

    // Initializing logging middleware to log all types of logging events including errors
    // and info-level events.
    if config.logging || config.debug {
        env_logger::init();
    }

    log::info!("started server on port {}", config.port);

    let listener = TcpListener::bind((config.binding_ip_addr.clone(), config.port))?;

    run(listener, config)?.await
}

In file src/search_results_handler/aggregator.rs you will need to change the code to something like this (as shown below):

//! This module provides the functionality to scrape and gathers all the results from the upstream
//! search engines and then removes duplicate results.

use std::{collections::HashMap, time::Duration};

use rand::Rng;
use tokio::join;

use super::{
    aggregation_models::{RawSearchResult, SearchResult, SearchResults},
    user_agent::random_user_agent,
};

use crate::engines::{duckduckgo, searx};

/// A function that aggregates all the scraped results from the above upstream engines and
/// then removes duplicate results and if two results are found to be from two or more engines
/// then puts their names together to show the results are fetched from these upstream engines
/// and then removes all data from the HashMap and puts into a struct of all results aggregated
/// into a vector and also adds the query used into the struct this is neccessory because
/// otherwise the search bar in search remains empty if searched from the query url
///
/// # Example:
///
/// If you search from the url like `https://127.0.0.1/search?q=huston` then the search bar should
/// contain the word huston and not remain empty.
///
/// # Arguments
///
/// * `query` - Accepts a string to query with the above upstream search engines.
/// * `page` - Accepts an u32 page number.
/// * `random_delay` - Accepts a boolean value to add a random delay before making the request.
///
/// # Error
///
/// Returns an error a reqwest and scraping selector errors if any error occurs in the results
/// function in either `searx` or `duckduckgo` or both otherwise returns a `SearchResults struct`
/// containing appropriate values.
pub async fn aggregate(
    query: &str,
    page: u32,
    random_delay: bool,
    debug: bool,
) -> Result<SearchResults, Box<dyn std::error::Error>> {
    let user_agent: String = random_user_agent();
    let mut result_map: HashMap<String, RawSearchResult> = HashMap::new();

    // Add a random delay before making the request.
    if random_delay || !debug {
        let mut rng = rand::thread_rng();
        let delay_secs = rng.gen_range(1..10);
        std::thread::sleep(Duration::from_secs(delay_secs));
    }

    // fetch results from upstream search engines simultaneously/concurrently.
    let (ddg_map_results, searx_map_results) = join!(
        duckduckgo::results(query, page, &user_agent),
        searx::results(query, page, &user_agent)
    );

    let ddg_map_results: HashMap<String, RawSearchResult> = match ddg_map_results {
        Ok(result) => result,
        Err(e) => {
            if debug {
                // Log the verbose error with all the information if debug option has been enabled
               // in the config file.
                log::error!("\n{e:?}");
            }
            HashMap::new()
        }
    };
    let searx_map_results: HashMap<String, RawSearchResult> = match searx_map_results {
        Ok(result) => result,
        Err(e) => {
            if debug {
                // Log the verbose error with all the information if debug option has been enabled
               // in the config file.
                log::error!("\n{e:?}");
            }
            HashMap::new()
        }
    };

    result_map.extend(ddg_map_results);

    searx_map_results.into_iter().for_each(|(key, value)| {
        result_map
            .entry(key)
            .and_modify(|result| {
                result.add_engines(value.clone().engine());
            })
            .or_insert_with(|| -> RawSearchResult {
                RawSearchResult::new(
                    value.title.clone(),
                    value.visiting_url.clone(),
                    value.description.clone(),
                    value.engine.clone(),
                )
            });
    });

    Ok(SearchResults::new(
        result_map
            .into_iter()
            .map(|(key, value)| {
                SearchResult::new(
                    value.title,
                    value.visiting_url,
                    key,
                    value.description,
                    value.engine,
                )
            })
            .collect(),
        query.to_string(),
    ))
}

Now, you can test the app as usual, one way to test the error is to open your browser and provide a large page size to the query and see if it returns the expected results in the log.

http://127.0.0.1:8080/search?q=ls&page=10000000

xffxff and others added 2 commits June 15, 2023 06:27
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
@xffxff
Copy link
Contributor Author

xffxff commented Jun 14, 2023

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

What is the reason for modifying the code in src/bin/websurfx.rs?

@xffxff xffxff requested a review from neon-mmd June 14, 2023 22:49
@neon-mmd
Copy link
Owner

In file src/bin/websurfx.rs you will need to change the code to something like this (as shown below):

What is the reason for modifying the code in src/bin/websurfx.rs?

The reason to modify the code was to allow the env_logger to also include error-level messages as well in the logs on stdout which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages. Now, by modifying the code to not have any defaults and by initializing it directly we do not add any restrictions/constraints on the env_logger which allows it to log other levels of messages as well including info and error level messages 🙂 . What do you say ??

@xffxff
Copy link
Contributor Author

xffxff commented Jun 15, 2023

which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages.

I think that setting the log level to info allows all logs with a level higher than info to be printed to the stdout. I have tested this locally, and it seems to be working as expected. @neon-mmd

@neon-mmd
Copy link
Owner

which was not appearing previously because the env_logger was initialized with a log level of info by default which restricted it to only log info-level messages.

I think that setting the log level to info allows all logs with a level higher than info to be printed to the stdout. I have tested this locally, and it seems to be working as expected. @neon-mmd

Ok, I see 👀 but just get things right I will just test it one last time and see if it works because you know we need to make sure that it works right on all systems including ours. So just give me a few seconds if everything is working as expected then I will approve this PR because then it will be all good and ready to be merged 🙂 , What do you say??

Copy link
Owner

@neon-mmd neon-mmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok after testing the changes one last time 🙂 I found that indeed as you (@xffxff) suggested it works. It looks like it was a general misconfiguration of the system on my side and now since everything works as expected and the changes look perfect so I am approving this and I am merging this PR 🙂 Also, thanks for putting up this PR I really appreciate it 🙂 .

@neon-mmd neon-mmd merged commit 42686b9 into neon-mmd:rolling Jun 15, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Improve error-handling by using error-stack crate Make logs more verbose when put in debug mode
3 participants