Skip to content

Conversation

@amhsirak
Copy link
Member

@amhsirak amhsirak commented Dec 5, 2024

@RohitR311 pls confirm maxun-core version before release.

Summary by CodeRabbit

  • New Features

    • Introduced version checking functionality in the NavBar component, notifying users of available updates.
    • Enhanced error handling and state management in the IntegrationSettings component.
    • Improved URL handling in the scraping functionality for more accurate data retrieval.
    • Enhanced workflow management with the addition of generated flags for workflows.
    • Integrated error handling improvements across various API endpoints related to recordings and runs.
    • Added a capture stage tracking feature to manage the state of the capture process.
  • Bug Fixes

    • Improved error handling and logging for various API endpoints related to recordings and runs.
  • Documentation

    • Updated interfaces and component structures for better clarity and maintainability.
  • Chores

    • Incremented package versions and updated dependencies across multiple files.

@amhsirak amhsirak added the Type: New Release Releases for Maxun - Bug Fix, Feature, HotFix, etc. label Dec 5, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
docker-compose.yml (1)

46-46: Consider documenting release changes

Since this is a release PR (v0.0.4), it would be helpful to document the changes or link to release notes.

Consider adding a comment above the image line documenting the key changes in this release or providing a link to the changelog/release notes.

+    # Release v0.0.4 - Add link to release notes or brief changelog here
     image: getmaxun/maxun-backend:v0.0.4
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 31388bd and 4f781da.

📒 Files selected for processing (1)
  • docker-compose.yml (1 hunks)
🔇 Additional comments (1)
docker-compose.yml (1)

46-46: Verify the backend image availability

The version bump from v0.0.3 to v0.0.4 looks good. Let's verify the image exists in the registry.

✅ Verification successful

Image version v0.0.4 is available and actively used

The Docker image getmaxun/maxun-backend:v0.0.4 exists in Docker Hub and is in active state. It was pushed on December 4th, 2024, and was last pulled on December 5th, 2024, indicating it's being actively used.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the new backend image version in the registry
# Note: Using curl to check Docker Hub API for the image tag

curl -s -f "https://hub.docker.com/v2/repositories/getmaxun/maxun-backend/tags/v0.0.4" || echo "Image not found"

Length of output: 1201

@amhsirak amhsirak marked this pull request as draft December 5, 2024 09:19
@amhsirak amhsirak requested a review from RohitR311 December 5, 2024 22:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
maxun-core/src/interpret.ts (3)

557-566: Performance concern in pagination handling.

Using JSON.stringify for uniqueness checking can be expensive for large objects. Consider implementing a more efficient unique key generation strategy.

-const uniqueKey = JSON.stringify(item);
+const uniqueKey = Object.entries(item)
+  .sort(([a], [b]) => a.localeCompare(b))
+  .map(([k, v]) => `${k}:${v}`)
+  .join('|');
🧰 Tools
🪛 Biome (1.9.4)

[error] 560-565: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


630-644: Improve logging in getMatchingActionId.

The debug logging is verbose and could impact performance. Consider adding a debug flag check.

 private getMatchingActionId(workflow: Workflow, pageState: PageState, usedActions: string[]) {
   for (let actionId = workflow.length - 1; actionId >= 0; actionId--) {
     const step = workflow[actionId];
     const isApplicable = this.applicable(step.where, pageState, usedActions);
+    if (this.options.debug) {
       console.log("-------------------------------------------------------------");
       console.log(`Where:`, step.where);
       console.log(`Page state:`, pageState);
       console.log(`Match result: ${isApplicable}`);
       console.log("-------------------------------------------------------------");
+    }
     
     if (isApplicable) {
         return actionId;
     }
   }
 }

Line range hint 647-741: Critical: Potential memory leak in workflow execution.

The workflow copy and selectors arrays are continuously modified but never cleaned up, which could lead to memory issues during long-running workflows.

  1. Clear the arrays when the workflow completes
  2. Consider implementing a maximum workflow size check
 private async runLoop(p: Page, workflow: Workflow) {
   const workflowCopy: Workflow = JSON.parse(JSON.stringify(workflow));
+  const MAX_WORKFLOW_SIZE = 1000; // Adjust based on requirements
+  
+  if (workflowCopy.length > MAX_WORKFLOW_SIZE) {
+    throw new Error(`Workflow size exceeds maximum limit of ${MAX_WORKFLOW_SIZE}`);
+  }

   // ... existing code ...

   while (true) {
     // ... existing code ...
     
     if (!action) {
+      // Clean up
+      selectors = [];
+      workflowCopy.length = 0;
       return;
     }
   }
 }
server/src/browser-management/classes/RemoteBrowser.ts (2)

93-107: Add unit tests for normalizeUrl method

The normalizeUrl method is crucial for preventing navigation loops and ensuring consistent URL formatting. Adding unit tests will help ensure it handles all edge cases correctly.


112-119: Add unit tests for shouldEmitUrlChange method

The shouldEmitUrlChange method determines if a URL change is significant. Unit tests can validate its logic against various scenarios, improving reliability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1383193 and 59cee8e.

📒 Files selected for processing (2)
  • maxun-core/src/interpret.ts (12 hunks)
  • server/src/browser-management/classes/RemoteBrowser.ts (6 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
maxun-core/src/interpret.ts

[error] 233-233: Avoid the use of spread (...) syntax on accumulators.

Spread syntax should be avoided on accumulators (like those in .reduce) because it causes a time complexity of O(n^2).
Consider methods such as .splice or .push instead.

(lint/performance/noAccumulatingSpread)


[error] 560-565: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (5)
maxun-core/src/interpret.ts (3)

151-168: Improved selector collection implementation.

The new implementation is more efficient as it:

  1. Uses Set for automatic deduplication
  2. Processes selectors in reverse order
  3. Returns early when selectors are found

560-565: ⚠️ Potential issue

Fix variable scope in switch statement.

The static analysis tool correctly flags a potential issue with variable scope in the switch statement.

 case 'clickNext':
+  {
     const pageResults = await page.evaluate((cfg) => window.scrapeList(cfg), config);
     
     // Filter out already scraped items
     const newResults = pageResults.filter(item => {
       const uniqueKey = JSON.stringify(item);
       if (scrapedItems.has(uniqueKey)) return false;
       scrapedItems.add(uniqueKey);
       return true;
     });
     
     allResults = allResults.concat(newResults);
     
     if (config.limit && allResults.length >= config.limit) {
       return allResults.slice(0, config.limit);
     }
+  }

Likely invalid or redundant comment.

🧰 Tools
🪛 Biome (1.9.4)

[error] 560-565: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


180-236: Verify the impact of URL override logic.

The URL override logic at lines 224-225 might cause issues if action.where.url is unexpectedly undefined or if the URL comparison is case-sensitive.

✅ Verification successful

Based on the search results, I can now provide a final response:

URL comparison logic is properly implemented with necessary safeguards

The URL override logic is safe because:

  • The code has a null check with action && action.where.url
  • The URL comparison includes an explicit check for "about:blank"
  • The where property and its URL field are properly typed and validated:
    • Joi schema validation enforces URL format: url: [Joi.string().uri(), regex]
    • WhereWhatPair interface properly defines the URL structure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential URL comparison edge cases
ast-grep --pattern 'if ($condition && $_ !== $_) {
  $_ = $_;
}'

Length of output: 66


Script:

#!/bin/bash
# Search for URL comparison patterns in the codebase
rg "action.*where\.url.*!==" -A 2

# Check the type definition of action.where
ast-grep --pattern 'interface $_ {
  where: {
    $$$
  }
}'

# Search for URL-related error handling
rg "action.*where\.url" -A 3 -B 1

Length of output: 641


Script:

#!/bin/bash
# Check the type definitions and interfaces related to the action object
ast-grep --pattern 'type $_ = {
  where: $_;
}'

# Look for Action interface definition
ast-grep --pattern 'interface Action {
  $$$
}'

# Search for URL-related type checks or validations
rg "where\?: \{" -A 3

Length of output: 123


Script:

#!/bin/bash
# Search for where property definitions with different pattern
rg "where:\s*{" -A 3

# Look for type imports related to Action
rg "import.*Action" -A 1

# Check for URL property definitions
rg "url:.*" -A 1 -B 2

Length of output: 11700

🧰 Tools
🪛 Biome (1.9.4)

[error] 233-233: Avoid the use of spread (...) syntax on accumulators.

Spread syntax should be avoided on accumulators (like those in .reduce) because it causes a time complexity of O(n^2).
Consider methods such as .splice or .push instead.

(lint/performance/noAccumulatingSpread)

server/src/browser-management/classes/RemoteBrowser.ts (2)

18-18: Dependency version locking for 'idcac-playwright'

As previously noted, ensure that the idcac-playwright package version is locked in package.json to maintain release stability.

Run the following script to check the dependency version:

#!/bin/bash
# Description: Check the 'idcac-playwright' dependency version in package.json

grep '"idcac-playwright":' package.json

121-149: Refactored event listener setup improves code quality

Extracting event listener setup into setupPageEventListeners enhances code readability and adheres to the DRY principle by reducing duplication.

Comment on lines +472 to +479
const executeAction = async (invokee: any, methodName: string, args: any) => {
console.log("Executing action:", methodName, args);
if (!args || Array.isArray(args)) {
await (<any>invokee[methodName])(...(args ?? []));
} else {
await (<any>invokee[methodName])(args);
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider error handling in executeAction.

The executeAction function should include try-catch blocks to handle potential invocation errors gracefully.

 const executeAction = async (invokee: any, methodName: string, args: any) => {
   console.log("Executing action:", methodName, args);
+  try {
     if (!args || Array.isArray(args)) {
       await (<any>invokee[methodName])(...(args ?? []));
     } else {
       await (<any>invokee[methodName])(args);
     }
+  } catch (error) {
+    console.error(`Failed to execute ${methodName}:`, error);
+    throw error;
+  }
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const executeAction = async (invokee: any, methodName: string, args: any) => {
console.log("Executing action:", methodName, args);
if (!args || Array.isArray(args)) {
await (<any>invokee[methodName])(...(args ?? []));
} else {
await (<any>invokee[methodName])(args);
}
};
const executeAction = async (invokee: any, methodName: string, args: any) => {
console.log("Executing action:", methodName, args);
try {
if (!args || Array.isArray(args)) {
await (<any>invokee[methodName])(...(args ?? []));
} else {
await (<any>invokee[methodName])(args);
}
} catch (error) {
console.error(`Failed to execute ${methodName}:`, error);
throw error;
}
};

Comment on lines +132 to +148
// Handle page load events with retry mechanism
page.on('load', async () => {
const injectScript = async (): Promise<boolean> => {
try {
await page.waitForLoadState('networkidle', { timeout: 5000 });

await page.evaluate(getInjectableScript());
return true;
} catch (error: any) {
logger.log('warn', `Script injection attempt failed: ${error.message}`);
return false;
}
};

const success = await injectScript();
console.log("Script injection result:", success);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement a retry mechanism for script injection

The code mentions a retry mechanism, but currently, the script injection is attempted only once. Consider implementing retries to handle transient failures during script injection.

You could modify the injectScript function as follows:

             const injectScript = async (): Promise<boolean> => {
+                const maxRetries = 3;
+                let attempts = 0;
+                while (attempts < maxRetries) {
+                    try {
+                        await page.waitForLoadState('networkidle', { timeout: 5000 });
+                        await page.evaluate(getInjectableScript());
+                        return true;
+                    } catch (error: any) {
+                        attempts++;
+                        logger.log('warn', `Script injection attempt ${attempts} failed: ${error.message}`);
+                        if (attempts >= maxRetries) {
+                            return false;
+                        }
+                    }
+                }
+                return false;
             };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Handle page load events with retry mechanism
page.on('load', async () => {
const injectScript = async (): Promise<boolean> => {
try {
await page.waitForLoadState('networkidle', { timeout: 5000 });
await page.evaluate(getInjectableScript());
return true;
} catch (error: any) {
logger.log('warn', `Script injection attempt failed: ${error.message}`);
return false;
}
};
const success = await injectScript();
console.log("Script injection result:", success);
});
// Handle page load events with retry mechanism
page.on('load', async () => {
const injectScript = async (): Promise<boolean> => {
const maxRetries = 3;
let attempts = 0;
while (attempts < maxRetries) {
try {
await page.waitForLoadState('networkidle', { timeout: 5000 });
await page.evaluate(getInjectableScript());
return true;
} catch (error: any) {
attempts++;
logger.log('warn', `Script injection attempt ${attempts} failed: ${error.message}`);
if (attempts >= maxRetries) {
return false;
}
}
}
return false;
};
const success = await injectScript();
console.log("Script injection result:", success);
});

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
src/context/browserActions.tsx (2)

Line range hint 3-3: Consider removing the empty string from CaptureStage type

The empty string in CaptureStage type could lead to ambiguous states since 'initial' already represents the starting state. Consider making it more strict by removing the empty string option.

-export type CaptureStage = 'initial' | 'pagination' | 'limit' | 'complete' | '';
+export type CaptureStage = 'initial' | 'pagination' | 'limit' | 'complete';

Also applies to: 8-8


55-57: Consider adding state transition validation

The current implementation allows any state transition without validation. This could lead to unexpected states if multiple actions are triggered simultaneously. Consider implementing a state machine to enforce valid transitions.

Example implementation:

const isValidTransition = (current: CaptureStage, next: CaptureStage): boolean => {
  const transitions: Record<CaptureStage, CaptureStage[]> = {
    'initial': ['pagination', 'complete'],
    'pagination': ['limit', 'complete'],
    'limit': ['complete'],
    'complete': ['initial']
  };
  return transitions[current]?.includes(next) ?? false;
};

// Usage in setCaptureStage
const setCaptureStage = (nextStage: CaptureStage) => {
  if (!isValidTransition(captureStage, nextStage)) {
    console.warn(`Invalid state transition from ${captureStage} to ${nextStage}`);
    return;
  }
  setStage(nextStage);
};

Also applies to: 63-66, 71-75, 77-83

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 59cee8e and b93712b.

📒 Files selected for processing (3)
  • server/src/workflow-management/classes/Generator.ts (2 hunks)
  • server/src/workflow-management/selector.ts (2 hunks)
  • src/context/browserActions.tsx (1 hunks)
🔇 Additional comments (7)
src/context/browserActions.tsx (2)

Line range hint 39-39: LGTM: State management is well-structured

The state initialization and management for captureStage follows React best practices with proper typing.


Line range hint 1-140: Verify integration with workflow management system

The new capture stage management needs to be properly integrated with the workflow management system mentioned in the AI summary.

✅ Verification successful

Integration of capture stage management is properly implemented

The verification shows that the new capture stage management is well integrated:

  • ActionDescriptionBox and RightSidePanel components properly consume and handle the capture stages
  • The capture flow is correctly implemented with stage transitions (initial → pagination → limit → complete)
  • UI elements are appropriately controlled based on the capture stage
  • No direct coupling with the workflow system is needed as it operates at a higher abstraction level

The changes are self-contained within the browser actions context and its consuming components, requiring no additional integration work.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for components consuming the capture stage
rg -l "useActionContext.*captureStage" 

# Check for any direct references to capture stages
rg -l "'initial'|'pagination'|'limit'|'complete'" --type tsx --type ts

# Look for potential workflow integration points
ast-grep --pattern 'workflow.$$$captureStage'

Length of output: 213


Script:

#!/bin/bash
# Check for components consuming the capture stage with proper file type
rg -l "useActionContext.*captureStage"

# Check for any direct references to capture stages
rg -l "'initial'|'pagination'|'limit'|'complete'" --type ts

# Look for workflow related files
fd -e ts -e tsx workflow

# Check for components using ActionContext
rg -l "useActionContext"

# Check for files importing browserActions context
rg -l "from ['\"]\./context/browserActions['\"]"

Length of output: 641


Script:

#!/bin/bash
# Check how captureStage is used in the components
rg "captureStage" src/components/molecules/ActionDescriptionBox.tsx -A 2 -B 2
rg "captureStage" src/components/organisms/RightSidePanel.tsx -A 2 -B 2

# Check workflow type definitions
cat maxun-core/src/types/workflow.ts

# Look for any state management in workflow files
rg "state" src/api/workflow.ts

Length of output: 4563

server/src/workflow-management/classes/Generator.ts (2)

544-544: LGTM: Proper integration of listSelector parameter

The addition of this.listSelector parameter to getElementInformation call is consistent with the updated function signature and improves selector generation accuracy.


572-574: LGTM: Consistent parameter passing

The changes maintain consistency by passing this.listSelector to both getRect and getElementInformation calls.

server/src/workflow-management/selector.ts (3)

99-101: ⚠️ Potential issue

Fix potential division by zero in overlap calculation

The calculation of significantOverlap could result in a division by zero if the parent element has zero area.

Apply this fix:

-            const significantOverlap =
-              (childRect.width * childRect.height) /
-              (parentRect.width * parentRect.height) > 0.5;
+            const parentArea = parentRect.width * parentRect.height;
+            const childArea = childRect.width * childRect.height;
+            let significantOverlap = false;
+            if (parentArea > 0) {
+              significantOverlap = (childArea / parentArea) > 0.5;
+            }

77-84: 🛠️ Refactor suggestion

Extract duplicated container tags array

The container tags array is duplicated in three locations. This violates the DRY principle and makes maintenance more difficult.

Extract the array to a constant at the file level:

+ const CONTAINER_TAGS = ['DIV', 'SECTION', 'ARTICLE', 'MAIN', 'HEADER', 'FOOTER', 'NAV', 'ASIDE', 
+   'ADDRESS', 'BLOCKQUOTE', 'DETAILS', 'DIALOG', 'FIGURE', 'FIGCAPTION', 'MAIN', 'MARK', 'SUMMARY', 'TIME',
+   'TABLE', 'THEAD', 'TBODY', 'TFOOT', 'TR', 'TH', 'TD', 'CAPTION', 'COLGROUP', 'COL', 'FORM', 'FIELDSET',
+   'LEGEND', 'LABEL', 'INPUT', 'BUTTON', 'SELECT', 'DATALIST', 'OPTGROUP', 'OPTION', 'TEXTAREA', 'OUTPUT',
+   'PROGRESS', 'METER', 'DETAILS', 'SUMMARY', 'MENU', 'MENUITEM', 'MENUITEM', 'APPLET', 'EMBED', 'OBJECT',
+   'PARAM', 'VIDEO', 'AUDIO', 'SOURCE', 'TRACK', 'CANVAS', 'MAP', 'AREA', 'SVG', 'IFRAME', 'FRAME', 'FRAMESET',
+   'LI', 'UL', 'OL', 'DL', 'DT', 'DD', 'HR', 'P', 'PRE', 'LISTING', 'PLAINTEXT', 'A'
+ ] as const;

Then use CONTAINER_TAGS in all three locations instead of duplicating the array.

Also applies to: 205-212, 917-924


940-942: ⚠️ Potential issue

Fix potential division by zero in getNonUniqueSelectors

Similar to the earlier issue, this calculation could result in a division by zero.

Apply the same fix here:

-        const significantOverlap =
-          (childRect.width * childRect.height) /
-          (parentRect.width * parentRect.height) > 0.5;
+        const parentArea = parentRect.width * parentRect.height;
+        const childArea = childRect.width * childRect.height;
+        let significantOverlap = false;
+        if (parentArea > 0) {
+          significantOverlap = (childArea / parentArea) > 0.5;
+        }

Likely invalid or redundant comment.

const startPaginationMode = () => {
setPaginationMode(true);
setCaptureStage('pagination');
socket?.emit('setGetList', { getList: false });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for socket communication

The socket emit lacks error handling and could silently fail. Consider adding error handling and user feedback.

-socket?.emit('setGetList', { getList: false });
+try {
+  if (!socket) {
+    throw new Error('Socket connection not available');
+  }
+  socket.emit('setGetList', { getList: false });
+} catch (error) {
+  console.error('Failed to communicate with server:', error);
+  // Consider adding user feedback mechanism
+}

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +16 to +159
export const getElementInformation = async (
page: Page,
coordinates: Coordinates,
listSelector: string,
) => {
try {
const rect = await page.evaluate(
async ({ x, y }) => {
const el = document.elementFromPoint(x, y) as HTMLElement;
if (el) {
const { parentElement } = el;
// Match the logic in recorder.ts for link clicks
const element = parentElement?.tagName === 'A' ? parentElement : el;
const rectangle = element?.getBoundingClientRect();
// @ts-ignore
if (rectangle) {
return {
x: rectangle.x,
y: rectangle.y,
width: rectangle.width,
height: rectangle.height,
top: rectangle.top,
right: rectangle.right,
bottom: rectangle.bottom,
left: rectangle.left,
if (listSelector !== '') {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const el = document.elementFromPoint(x, y) as HTMLElement;
if (el) {
const { parentElement } = el;
const element = parentElement?.tagName === 'A' ? parentElement : el;
let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};
if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}
// Gather specific information based on the tag
if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}
info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
}
},
{ x: coordinates.x, y: coordinates.y },
);
return rect;
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
} else {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const originalEl = document.elementFromPoint(x, y) as HTMLElement;
if (originalEl) {
let element = originalEl;

const containerTags = ['DIV', 'SECTION', 'ARTICLE', 'MAIN', 'HEADER', 'FOOTER', 'NAV', 'ASIDE',
'ADDRESS', 'BLOCKQUOTE', 'DETAILS', 'DIALOG', 'FIGURE', 'FIGCAPTION', 'MAIN', 'MARK', 'SUMMARY', 'TIME',
'TABLE', 'THEAD', 'TBODY', 'TFOOT', 'TR', 'TH', 'TD', 'CAPTION', 'COLGROUP', 'COL', 'FORM', 'FIELDSET',
'LEGEND', 'LABEL', 'INPUT', 'BUTTON', 'SELECT', 'DATALIST', 'OPTGROUP', 'OPTION', 'TEXTAREA', 'OUTPUT',
'PROGRESS', 'METER', 'DETAILS', 'SUMMARY', 'MENU', 'MENUITEM', 'MENUITEM', 'APPLET', 'EMBED', 'OBJECT',
'PARAM', 'VIDEO', 'AUDIO', 'SOURCE', 'TRACK', 'CANVAS', 'MAP', 'AREA', 'SVG', 'IFRAME', 'FRAME', 'FRAMESET',
'LI', 'UL', 'OL', 'DL', 'DT', 'DD', 'HR', 'P', 'PRE', 'LISTING', 'PLAINTEXT', 'A'
];
while (element.parentElement) {
const parentRect = element.parentElement.getBoundingClientRect();
const childRect = element.getBoundingClientRect();

if (!containerTags.includes(element.parentElement.tagName)) {
break;
}

const fullyContained =
parentRect.left <= childRect.left &&
parentRect.right >= childRect.right &&
parentRect.top <= childRect.top &&
parentRect.bottom >= childRect.bottom;

const significantOverlap =
(childRect.width * childRect.height) /
(parentRect.width * parentRect.height) > 0.5;

if (fullyContained && significantOverlap) {
element = element.parentElement;
} else {
break;
}
}

let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};

if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}

if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}

info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
}
} catch (error) {
const { message, stack } = error as Error;
logger.log('error', `Error while retrieving selector: ${message}`);
logger.log('error', `Stack: ${stack}`);
console.error('Error while retrieving selector:', message);
console.error('Stack:', stack);
}
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling in getElementInformation

The function has good logic for handling different element types, but the error handling could be improved:

  1. The error is logged but no error type is returned
  2. The function implicitly returns undefined on error

Consider applying this change:

  } catch (error) {
    const { message, stack } = error as Error;
    console.error('Error while retrieving selector:', message);
    console.error('Stack:', stack);
+   return {
+     error: message,
+     tagName: '',
+   };
  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const getElementInformation = async (
page: Page,
coordinates: Coordinates,
listSelector: string,
) => {
try {
const rect = await page.evaluate(
async ({ x, y }) => {
const el = document.elementFromPoint(x, y) as HTMLElement;
if (el) {
const { parentElement } = el;
// Match the logic in recorder.ts for link clicks
const element = parentElement?.tagName === 'A' ? parentElement : el;
const rectangle = element?.getBoundingClientRect();
// @ts-ignore
if (rectangle) {
return {
x: rectangle.x,
y: rectangle.y,
width: rectangle.width,
height: rectangle.height,
top: rectangle.top,
right: rectangle.right,
bottom: rectangle.bottom,
left: rectangle.left,
if (listSelector !== '') {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const el = document.elementFromPoint(x, y) as HTMLElement;
if (el) {
const { parentElement } = el;
const element = parentElement?.tagName === 'A' ? parentElement : el;
let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};
if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}
// Gather specific information based on the tag
if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}
info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
}
},
{ x: coordinates.x, y: coordinates.y },
);
return rect;
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
} else {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const originalEl = document.elementFromPoint(x, y) as HTMLElement;
if (originalEl) {
let element = originalEl;
const containerTags = ['DIV', 'SECTION', 'ARTICLE', 'MAIN', 'HEADER', 'FOOTER', 'NAV', 'ASIDE',
'ADDRESS', 'BLOCKQUOTE', 'DETAILS', 'DIALOG', 'FIGURE', 'FIGCAPTION', 'MAIN', 'MARK', 'SUMMARY', 'TIME',
'TABLE', 'THEAD', 'TBODY', 'TFOOT', 'TR', 'TH', 'TD', 'CAPTION', 'COLGROUP', 'COL', 'FORM', 'FIELDSET',
'LEGEND', 'LABEL', 'INPUT', 'BUTTON', 'SELECT', 'DATALIST', 'OPTGROUP', 'OPTION', 'TEXTAREA', 'OUTPUT',
'PROGRESS', 'METER', 'DETAILS', 'SUMMARY', 'MENU', 'MENUITEM', 'MENUITEM', 'APPLET', 'EMBED', 'OBJECT',
'PARAM', 'VIDEO', 'AUDIO', 'SOURCE', 'TRACK', 'CANVAS', 'MAP', 'AREA', 'SVG', 'IFRAME', 'FRAME', 'FRAMESET',
'LI', 'UL', 'OL', 'DL', 'DT', 'DD', 'HR', 'P', 'PRE', 'LISTING', 'PLAINTEXT', 'A'
];
while (element.parentElement) {
const parentRect = element.parentElement.getBoundingClientRect();
const childRect = element.getBoundingClientRect();
if (!containerTags.includes(element.parentElement.tagName)) {
break;
}
const fullyContained =
parentRect.left <= childRect.left &&
parentRect.right >= childRect.right &&
parentRect.top <= childRect.top &&
parentRect.bottom >= childRect.bottom;
const significantOverlap =
(childRect.width * childRect.height) /
(parentRect.width * parentRect.height) > 0.5;
if (fullyContained && significantOverlap) {
element = element.parentElement;
} else {
break;
}
}
let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};
if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}
if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}
info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
}
} catch (error) {
const { message, stack } = error as Error;
logger.log('error', `Error while retrieving selector: ${message}`);
logger.log('error', `Stack: ${stack}`);
console.error('Error while retrieving selector:', message);
console.error('Stack:', stack);
}
}
};
export const getElementInformation = async (
page: Page,
coordinates: Coordinates,
listSelector: string,
) => {
try {
if (listSelector !== '') {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const el = document.elementFromPoint(x, y) as HTMLElement;
if (el) {
const { parentElement } = el;
const element = parentElement?.tagName === 'A' ? parentElement : el;
let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};
if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}
// Gather specific information based on the tag
if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}
info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
} else {
const elementInfo = await page.evaluate(
async ({ x, y }) => {
const originalEl = document.elementFromPoint(x, y) as HTMLElement;
if (originalEl) {
let element = originalEl;
const containerTags = ['DIV', 'SECTION', 'ARTICLE', 'MAIN', 'HEADER', 'FOOTER', 'NAV', 'ASIDE',
'ADDRESS', 'BLOCKQUOTE', 'DETAILS', 'DIALOG', 'FIGURE', 'FIGCAPTION', 'MAIN', 'MARK', 'SUMMARY', 'TIME',
'TABLE', 'THEAD', 'TBODY', 'TFOOT', 'TR', 'TH', 'TD', 'CAPTION', 'COLGROUP', 'COL', 'FORM', 'FIELDSET',
'LEGEND', 'LABEL', 'INPUT', 'BUTTON', 'SELECT', 'DATALIST', 'OPTGROUP', 'OPTION', 'TEXTAREA', 'OUTPUT',
'PROGRESS', 'METER', 'DETAILS', 'SUMMARY', 'MENU', 'MENUITEM', 'MENUITEM', 'APPLET', 'EMBED', 'OBJECT',
'PARAM', 'VIDEO', 'AUDIO', 'SOURCE', 'TRACK', 'CANVAS', 'MAP', 'AREA', 'SVG', 'IFRAME', 'FRAME', 'FRAMESET',
'LI', 'UL', 'OL', 'DL', 'DT', 'DD', 'HR', 'P', 'PRE', 'LISTING', 'PLAINTEXT', 'A'
];
while (element.parentElement) {
const parentRect = element.parentElement.getBoundingClientRect();
const childRect = element.getBoundingClientRect();
if (!containerTags.includes(element.parentElement.tagName)) {
break;
}
const fullyContained =
parentRect.left <= childRect.left &&
parentRect.right >= childRect.right &&
parentRect.top <= childRect.top &&
parentRect.bottom >= childRect.bottom;
const significantOverlap =
(childRect.width * childRect.height) /
(parentRect.width * parentRect.height) > 0.5;
if (fullyContained && significantOverlap) {
element = element.parentElement;
} else {
break;
}
}
let info: {
tagName: string;
hasOnlyText?: boolean;
innerText?: string;
url?: string;
imageUrl?: string;
attributes?: Record<string, string>;
innerHTML?: string;
outerHTML?: string;
} = {
tagName: element?.tagName ?? '',
};
if (element) {
info.attributes = Array.from(element.attributes).reduce(
(acc, attr) => {
acc[attr.name] = attr.value;
return acc;
},
{} as Record<string, string>
);
}
if (element?.tagName === 'A') {
info.url = (element as HTMLAnchorElement).href;
info.innerText = element.innerText ?? '';
} else if (element?.tagName === 'IMG') {
info.imageUrl = (element as HTMLImageElement).src;
} else {
info.hasOnlyText = element?.children?.length === 0 &&
element?.innerText?.length > 0;
info.innerText = element?.innerText ?? '';
}
info.innerHTML = element.innerHTML;
info.outerHTML = element.outerHTML;
return info;
}
return null;
},
{ x: coordinates.x, y: coordinates.y },
);
return elementInfo;
}
} catch (error) {
const { message, stack } = error as Error;
console.error('Error while retrieving selector:', message);
console.error('Stack:', stack);
return {
error: message,
tagName: '',
};
}
};

@amhsirak amhsirak merged commit a879078 into master Dec 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: New Release Releases for Maxun - Bug Fix, Feature, HotFix, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants