Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
6e2c583
feat: add cache logic
RohitR311 Sep 11, 2025
dad2fd0
fix: limit instruction highlighting
amhsirak Sep 13, 2025
6d6ef68
fix: limit instruct highlighting
RohitR311 Sep 16, 2025
0ad7666
fix: resolve merge conflicts
RohitR311 Sep 16, 2025
ad2830a
fix: disable inline images
RohitR311 Sep 16, 2025
2a1b178
fix: container parent filtering
RohitR311 Sep 16, 2025
0e32ff0
feat: use onCancel
amhsirak Sep 18, 2025
6edddbe
fix: !use onArrowBack
amhsirak Sep 18, 2025
1bbd2b7
fix: redirect to /integrate/googleSheets on auth
amhsirak Sep 18, 2025
76ce58f
fix: redirect to /robots on tab click
RohitR311 Sep 18, 2025
f885269
feat: persist recording id
RohitR311 Sep 19, 2025
8839226
feat: cache robot and run fetch data
RohitR311 Sep 19, 2025
6d25f26
feat: invalidate runs data
RohitR311 Sep 19, 2025
47eb3fc
chore: add tanstack react query
RohitR311 Sep 19, 2025
6c1d661
feat: invalidate runs during robot run
RohitR311 Sep 21, 2025
d2e0324
Merge pull request #781 from getmaxun/instruct-fix
amhsirak Sep 21, 2025
a0e6fe8
fix: instant discard redirect
amhsirak Sep 23, 2025
e6d5a34
fix: prioritize dialog elem sorting
RohitR311 Sep 24, 2025
903c318
feat: use h5 variant
amhsirak Sep 26, 2025
24a78f8
fix: remove font weight
amhsirak Sep 26, 2025
d1b6404
fix: cors navigation iframe rendering
RohitR311 Sep 28, 2025
0f75917
feat: add integration error handling
RohitR311 Sep 28, 2025
0aeb8ad
feat: map cleanup, process duration
RohitR311 Sep 28, 2025
c7ec3cf
fix: server uncaught err handling
RohitR311 Sep 28, 2025
24af62c
fix: socket cleanup, err handling
RohitR311 Sep 28, 2025
e75a10d
feat: add batch persistence logic
RohitR311 Sep 28, 2025
f77f42a
feat: add abort state getter
RohitR311 Sep 28, 2025
a83b69c
fix: add process retry count logic
RohitR311 Sep 28, 2025
47d1b24
fix: timeout mechanism, revamp working button logic
RohitR311 Sep 28, 2025
d98531b
feat: move robot create modal to page
RohitR311 Sep 29, 2025
79b8491
Merge pull request #782 from RohitR311/image-fix
amhsirak Sep 29, 2025
0e22b04
Merge pull request #783 from RohitR311/better-selgen
amhsirak Sep 29, 2025
b8fa4fd
Merge pull request #788 from getmaxun/fix-integrations-back
amhsirak Sep 29, 2025
8819c0a
Merge pull request #789 from getmaxun/fix-integration-auth
amhsirak Sep 29, 2025
6b81aa6
Merge pull request #790 from RohitR311/robtab-fix
amhsirak Sep 29, 2025
b61a177
Merge pull request #791 from RohitR311/cache-api-v2
amhsirak Sep 29, 2025
f36034c
Merge pull request #804 from getmaxun/discard-instant
amhsirak Sep 29, 2025
61f2087
Merge pull request #805 from getmaxun/highlight-fix
amhsirak Sep 29, 2025
79ea05b
Merge pull request #809 from getmaxun/robot-opt
amhsirak Sep 29, 2025
dc30e15
Merge pull request #810 from getmaxun/cors-fix
amhsirak Sep 29, 2025
f34e217
Merge pull request #811 from getmaxun/optim-record
amhsirak Sep 29, 2025
f4ab7dc
Merge pull request #812 from getmaxun/crearob-page
amhsirak Sep 29, 2025
ca65cbe
chore: install tanstack react query
amhsirak Sep 29, 2025
2a1cde5
chore: core v0.0.24
amhsirak Sep 29, 2025
0ced4cc
chore: maxun v0.0.24
amhsirak Sep 29, 2025
b1c3032
chore: use maxun-core v0.0.24
amhsirak Sep 29, 2025
7155b74
chore: remove discount code
amhsirak Sep 29, 2025
e3e735b
feat: wrap video tutorials and docs inside one modal
amhsirak Sep 29, 2025
8badf08
fix: remove tutorials tab
amhsirak Sep 29, 2025
3fbe644
fix: remove cloud modal
amhsirak Sep 29, 2025
5bb2b87
fix: rm deepest elem dialog filtering
RohitR311 Sep 29, 2025
b5e1277
fix: remove cloud modal
amhsirak Sep 29, 2025
5a45160
fix: remove unusedd state
amhsirak Sep 29, 2025
d1f6a9f
fix: lint
amhsirak Sep 29, 2025
a60df7f
Merge pull request #813 from getmaxun/pre-release-24
amhsirak Sep 29, 2025
0a914b0
fix: rm Run require model
RohitR311 Sep 29, 2025
6b7a1f8
fix: socket conn handling
RohitR311 Sep 29, 2025
e4c5237
Merge pull request #814 from getmaxun/dialog-fix
RohitR311 Sep 29, 2025
9a2d60e
fix: render run immediately
RohitR311 Sep 29, 2025
e916633
chore: remove ts-node dependency for server, use tsc + node (#107)
Aman-Raj-bat Oct 2, 2025
d410f27
Merge pull request #815 from Aman-Raj-bat/feat/remove-ts-node
amhsirak Oct 3, 2025
95a4d3c
chore: add use-cases
amhsirak Oct 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,6 @@ Maxun lets you create custom robots which emulate user actions and extract data.
2. Capture Text: Useful to extract individual text content from the website.
3. Capture Screenshot: Get fullpage or visible section screenshots of the website.

## 2. BYOP
BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot protection. Currently, the proxies are per user. Soon you'll be able to configure proxy per robot.


# Features
- ✨ Extract Data With No-Code
- ✨ Handle Pagination & Scrolling
Expand All @@ -136,9 +132,11 @@ BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot
- ✨ Adapt To Website Layout Changes
- ✨ Extract Behind Login
- ✨ Integrations
- ✨ MCP Server
- ✨ Bypass 2FA & MFA For Extract Behind Login (coming soon)
- +++ A lot of amazing things!
- ✨ MCP

# Use Cases
Maxun can be used for various use-cases, including lead generation, market research, content aggregation and more.
View use-cases in detail here: https://www.maxun.dev/#usecases

# Screenshots
![Maxun PH Launch (1)-1-1](https://github.com/user-attachments/assets/d7c75fa2-2bbc-47bb-a5f6-0ee6c162f391)
Expand Down
2 changes: 1 addition & 1 deletion maxun-core/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "maxun-core",
"version": "0.0.23",
"version": "0.0.24",
"description": "Core package for Maxun, responsible for data extraction",
"main": "build/index.js",
"typings": "build/index.d.ts",
Expand Down
139 changes: 121 additions & 18 deletions maxun-core/src/interpret.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,13 @@ export default class Interpreter extends EventEmitter {
this.isAborted = true;
}

/**
* Returns the current abort status
*/
public getIsAborted(): boolean {
return this.isAborted;
}

private async applyAdBlocker(page: Page): Promise<void> {
if (this.blocker) {
try {
Expand Down Expand Up @@ -610,6 +617,13 @@ export default class Interpreter extends EventEmitter {

if (methodName === 'waitForLoadState') {
try {
let args = step.args;

if (Array.isArray(args) && args.length === 1) {
args = [args[0], { timeout: 30000 }];
} else if (!Array.isArray(args)) {
args = [args, { timeout: 30000 }];
}
await executeAction(invokee, methodName, step.args);
} catch (error) {
await executeAction(invokee, methodName, 'domcontentloaded');
Expand Down Expand Up @@ -670,7 +684,19 @@ export default class Interpreter extends EventEmitter {
return;
}

const results = await page.evaluate((cfg) => window.scrapeList(cfg), config);
const evaluationPromise = page.evaluate((cfg) => window.scrapeList(cfg), config);
const timeoutPromise = new Promise<any[]>((_, reject) =>
setTimeout(() => reject(new Error('Page evaluation timeout')), 10000)
);

let results;
try {
results = await Promise.race([evaluationPromise, timeoutPromise]);
} catch (error) {
debugLog(`Page evaluation failed: ${error.message}`);
return;
}

const newResults = results.filter(item => {
const uniqueKey = JSON.stringify(item);
if (scrapedItems.has(uniqueKey)) return false;
Expand All @@ -691,43 +717,94 @@ export default class Interpreter extends EventEmitter {
return false;
};

// Helper function to detect if a selector is XPath
const isXPathSelector = (selector: string): boolean => {
return selector.startsWith('//') ||
selector.startsWith('/') ||
selector.startsWith('./') ||
selector.includes('contains(@') ||
selector.includes('[count(') ||
selector.includes('@class=') ||
selector.includes('@id=') ||
selector.includes(' and ') ||
selector.includes(' or ');
};

// Helper function to wait for selector (CSS or XPath)
const waitForSelectorUniversal = async (selector: string, options: any = {}): Promise<ElementHandle | null> => {
try {
if (isXPathSelector(selector)) {
// Use XPath locator
const locator = page.locator(`xpath=${selector}`);
await locator.waitFor({
state: 'attached',
timeout: options.timeout || 10000
});
return await locator.elementHandle();
} else {
// Use CSS selector
return await page.waitForSelector(selector, {
state: 'attached',
timeout: options.timeout || 10000
});
}
} catch (error) {
return null;
}
};

// Enhanced button finder with retry mechanism
const findWorkingButton = async (selectors: string[]): Promise<{
button: ElementHandle | null,
const findWorkingButton = async (selectors: string[]): Promise<{
button: ElementHandle | null,
workingSelector: string | null,
updatedSelectors: string[]
}> => {
let updatedSelectors = [...selectors];

const startTime = Date.now();
const MAX_BUTTON_SEARCH_TIME = 15000;
let updatedSelectors = [...selectors];

for (let i = 0; i < selectors.length; i++) {
if (Date.now() - startTime > MAX_BUTTON_SEARCH_TIME) {
debugLog(`Button search timeout reached (${MAX_BUTTON_SEARCH_TIME}ms), aborting`);
break;
}
const selector = selectors[i];
let retryCount = 0;
let selectorSuccess = false;

while (retryCount < MAX_RETRIES && !selectorSuccess) {
try {
const button = await page.waitForSelector(selector, {
state: 'attached',
timeout: 10000
});

const button = await waitForSelectorUniversal(selector, { timeout: 2000 });

if (button) {
debugLog('Found working selector:', selector);
return {
button,
return {
button,
workingSelector: selector,
updatedSelectors
updatedSelectors
};
} else {
retryCount++;
debugLog(`Selector "${selector}" not found: attempt ${retryCount}/${MAX_RETRIES}`);

if (retryCount < MAX_RETRIES) {
await page.waitForTimeout(RETRY_DELAY);
} else {
debugLog(`Removing failed selector "${selector}" after ${MAX_RETRIES} attempts`);
updatedSelectors = updatedSelectors.filter(s => s !== selector);
selectorSuccess = true;
}
}
} catch (error) {
retryCount++;
debugLog(`Selector "${selector}" failed: attempt ${retryCount}/${MAX_RETRIES}`);
debugLog(`Selector "${selector}" error: attempt ${retryCount}/${MAX_RETRIES} - ${error.message}`);

if (retryCount < MAX_RETRIES) {
await page.waitForTimeout(RETRY_DELAY);
} else {
debugLog(`Removing failed selector "${selector}" after ${MAX_RETRIES} attempts`);
updatedSelectors = updatedSelectors.filter(s => s !== selector);
selectorSuccess = true;
}
}
}
Expand Down Expand Up @@ -1347,9 +1424,35 @@ export default class Interpreter extends EventEmitter {
}

private async ensureScriptsLoaded(page: Page) {
const isScriptLoaded = await page.evaluate(() => typeof window.scrape === 'function' && typeof window.scrapeSchema === 'function' && typeof window.scrapeList === 'function' && typeof window.scrapeListAuto === 'function' && typeof window.scrollDown === 'function' && typeof window.scrollUp === 'function');
if (!isScriptLoaded) {
await page.addInitScript({ path: path.join(__dirname, 'browserSide', 'scraper.js') });
try {
const evaluationPromise = page.evaluate(() =>
typeof window.scrape === 'function' &&
typeof window.scrapeSchema === 'function' &&
typeof window.scrapeList === 'function' &&
typeof window.scrapeListAuto === 'function' &&
typeof window.scrollDown === 'function' &&
typeof window.scrollUp === 'function'
);

const timeoutPromise = new Promise<boolean>((_, reject) =>
setTimeout(() => reject(new Error('Script check timeout')), 3000)
);

const isScriptLoaded = await Promise.race([
evaluationPromise,
timeoutPromise
]);

if (!isScriptLoaded) {
await page.addInitScript({ path: path.join(__dirname, 'browserSide', 'scraper.js') });
}
} catch (error) {
this.log(`Script check failed, adding script anyway: ${error.message}`, Level.WARN);
try {
await page.addInitScript({ path: path.join(__dirname, 'browserSide', 'scraper.js') });
} catch (scriptError) {
this.log(`Failed to add script: ${scriptError.message}`, Level.ERROR);
}
}
}

Expand Down
6 changes: 4 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "maxun",
"version": "0.0.23",
"version": "0.0.24",
"author": "Maxun",
"license": "AGPL-3.0-or-later",
"dependencies": {
Expand All @@ -11,6 +11,7 @@
"@mui/lab": "^5.0.0-alpha.80",
"@mui/material": "^5.6.2",
"@react-oauth/google": "^0.12.1",
"@tanstack/react-query": "^5.90.2",
"@testing-library/react": "^13.1.1",
"@testing-library/user-event": "^13.5.0",
"@types/bcrypt": "^5.0.2",
Expand Down Expand Up @@ -50,7 +51,7 @@
"lodash": "^4.17.21",
"loglevel": "^1.8.0",
"loglevel-plugin-remote": "^0.6.8",
"maxun-core": "^0.0.23",
"maxun-core": "^0.0.24",
"minio": "^8.0.1",
"moment-timezone": "^0.5.45",
"node-cron": "^3.0.3",
Expand Down Expand Up @@ -129,6 +130,7 @@
"ajv": "^8.8.2",
"concurrently": "^7.0.0",
"cross-env": "^7.0.3",
"esbuild": "^0.25.10",
"js-cookie": "^3.0.5",
"nodemon": "^2.0.15",
"sequelize-cli": "^6.6.2",
Expand Down
4 changes: 2 additions & 2 deletions server/src/api/record.ts
Original file line number Diff line number Diff line change
Expand Up @@ -710,8 +710,8 @@ async function executeRun(id: string, userId: string) {
retries: 5,
};

processAirtableUpdates();
processGoogleSheetUpdates();
processAirtableUpdates().catch(err => logger.log('error', `Airtable update error: ${err.message}`));
processGoogleSheetUpdates().catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
} catch (err: any) {
logger.log('error', `Failed to update Google Sheet for run: ${plainRun.runId}: ${err.message}`);
}
Expand Down
3 changes: 2 additions & 1 deletion server/src/browser-management/classes/RemoteBrowser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ export class RemoteBrowser {
);

await this.currentPage.mouse.wheel(data.deltaX, data.deltaY);
await this.currentPage.waitForLoadState("networkidle", { timeout: 5000 });

const scrollInfo = await this.currentPage.evaluate(() => ({
x: window.scrollX,
Expand Down Expand Up @@ -1590,7 +1591,7 @@ export class RemoteBrowser {
}

return window.rrwebSnapshot.snapshot(document, {
inlineImages: true,
inlineImages: false,
collectFonts: true,
});
});
Expand Down
39 changes: 34 additions & 5 deletions server/src/browser-management/controller.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,11 @@ export const createRemoteBrowserForRun = (userId: string): string => {

logger.log('info', `createRemoteBrowserForRun: Reserved slot ${id} for user ${userId}`);

initializeBrowserAsync(id, userId);
initializeBrowserAsync(id, userId)
.catch((error: any) => {
logger.log('error', `Unhandled error in initializeBrowserAsync for browser ${id}: ${error.message}`);
browserPool.failBrowserSlot(id);
});

return id;
};
Expand Down Expand Up @@ -110,7 +114,16 @@ export const destroyRemoteBrowser = async (id: string, userId: string): Promise<
} catch (switchOffError) {
logger.log('warn', `Error switching off browser ${id}: ${switchOffError}`);
}


try {
const namespace = io.of(id);
namespace.removeAllListeners();
namespace.disconnectSockets(true);
logger.log('debug', `Cleaned up socket namespace for browser ${id}`);
} catch (namespaceCleanupError: any) {
logger.log('warn', `Error cleaning up socket namespace for browser ${id}: ${namespaceCleanupError.message}`);
}

return browserPool.deleteRemoteBrowser(id);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
Expand Down Expand Up @@ -273,11 +286,27 @@ const initializeBrowserAsync = async (id: string, userId: string) => {
}

logger.log('debug', `Starting browser initialization for ${id}`);
await browserSession.initialize(userId);
logger.log('debug', `Browser initialization completed for ${id}`);


try {
await browserSession.initialize(userId);
logger.log('debug', `Browser initialization completed for ${id}`);
} catch (initError: any) {
try {
await browserSession.switchOff();
logger.log('info', `Cleaned up failed browser initialization for ${id}`);
} catch (cleanupError: any) {
logger.log('error', `Failed to cleanup browser ${id}: ${cleanupError.message}`);
}
throw initError;
}

const upgraded = browserPool.upgradeBrowserSlot(id, browserSession);
if (!upgraded) {
try {
await browserSession.switchOff();
} catch (cleanupError: any) {
logger.log('error', `Failed to cleanup browser after slot upgrade failure: ${cleanupError.message}`);
}
throw new Error('Failed to upgrade reserved browser slot');
}

Expand Down
10 changes: 8 additions & 2 deletions server/src/pgboss-worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ async function triggerIntegrationUpdates(runId: string, robotMetaId: string): Pr
retries: 5,
};

processAirtableUpdates();
processGoogleSheetUpdates();
processAirtableUpdates().catch(err => logger.log('error', `Airtable update error: ${err.message}`));
processGoogleSheetUpdates().catch(err => logger.log('error', `Google Sheets update error: ${err.message}`));
} catch (err: any) {
logger.log('error', `Failed to update integrations for run: ${runId}: ${err.message}`);
}
Expand Down Expand Up @@ -333,6 +333,12 @@ async function processRunExecution(job: Job<ExecuteRunData>) {
// Schedule updates for Google Sheets and Airtable
await triggerIntegrationUpdates(plainRun.runId, plainRun.robotMetaId);

// Flush any remaining persistence buffer before emitting socket event
if (browser && browser.interpreter) {
await browser.interpreter.flushPersistenceBuffer();
logger.log('debug', `Flushed persistence buffer before emitting run-completed for run ${data.runId}`);
}

const completionData = {
runId: data.runId,
robotMetaId: plainRun.robotMetaId,
Expand Down
Loading